Executive Summary

This analysis segments customers using RFM (Recency, Frequency, Monetary) methodology to identify high-value customer groups and inform targeted marketing strategies.

Key Objectives: - Segment customers based on purchasing behavior - Identify Champions (best customers) and At-Risk customers - Provide actionable recommendations for each segment - Quantify revenue opportunity by segment

Business Impact: [Will be filled after completing analysis]


Introduction to RFM Analysis

What is RFM?

RFM is a customer segmentation technique that analyzes three key behavioral dimensions:

Recency (R): How recently did the customer make a purchase? - Measured as: Days since last purchase - Business logic: Recent customers are more engaged and likely to purchase again - Scoring: Lower days = higher score (more recent = better)

Frequency (F): How often does the customer purchase? - Measured as: Total number of transactions - Business logic: Frequent buyers demonstrate loyalty and product satisfaction - Scoring: Higher transaction count = higher score

Monetary (M): How much does the customer spend? - Measured as: Total revenue generated - Business logic: High spenders are most valuable to the business - Scoring: Higher spend = higher score

Why RFM Matters for E-Commerce

  • Targeted Marketing: Different segments require different marketing approaches
  • Budget Optimization: Allocate resources to highest-value customers
  • Churn Prevention: Identify at-risk customers before they leave
  • Revenue Growth: Focus on moving customers to higher-value segments
  • Customer Lifetime Value: Predict future value based on current behavior

Our Methodology

Scoring Approach: Quintile-based (1-5 scale) - Divide customers into 5 equal groups for each metric - Assign scores 1 (lowest) to 5 (highest) - Combine into 3-digit RFM code (e.g., “543”)

Segmentation: 11 distinct customer segments - Based on RFM score patterns - Each segment has specific characteristics and recommended actions


Setup and Configuration

# Data manipulation and analysis
library(tidyverse)      # Core data manipulation
library(lubridate)      # Date handling
library(here)           # File paths

# Visualization
library(scales)         # Number formatting
library(viridis)        # Color palettes
library(ggthemes)       # Additional themes

# Tables and reporting
library(knitr)          # Table formatting
library(kableExtra)     # Enhanced tables
library(DT)             # Interactive tables

Packages loaded: All necessary tools for RFM analysis and visualization.


Data Loading and Validation

Load Customer Summary Data

# Load the pre-calculated customer summary from data cleaning
customer_summary <- read_csv(here("data", "processed", "customer_summary.csv"))

# Display confirmation
cat("Customer summary data loaded successfully!\n")
## Customer summary data loaded successfully!
cat("Total customers:", nrow(customer_summary), "\n")
## Total customers: 4371
cat("Columns:", ncol(customer_summary), "\n")
## Columns: 15

Data loaded: Customer summary with pre-calculated metrics from our data cleaning process.

Initial Data Inspection

# Display structure
glimpse(customer_summary)
## Rows: 4,371
## Columns: 15
## $ CustomerID            <dbl> 12346, 12347, 12348, 12349, 12350, 12352, 12353,…
## $ FirstPurchaseDate     <dttm> 2011-01-18 10:01:00, 2010-12-07 14:57:00, 2010-…
## $ LastPurchaseDate      <dttm> 2011-01-18 10:17:00, 2011-12-07 15:52:00, 2011-…
## $ TotalTransactions     <dbl> 1, 7, 4, 1, 1, 8, 1, 1, 1, 3, 1, 2, 4, 3, 1, 10,…
## $ TotalReturns          <dbl> 1, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 3, …
## $ TotalItemsPurchased   <dbl> 74215, 2458, 2341, 631, 197, 536, 20, 530, 240, …
## $ TotalItemsReturned    <dbl> 74215, 0, 0, 0, 0, 66, 0, 0, 0, 0, 0, 0, 10, 0, …
## $ TotalSpent            <dbl> 77183.60, 4310.00, 1797.24, 1757.55, 334.40, 250…
## $ TotalReturnValue      <dbl> 77183.60, 0.00, 0.00, 0.00, 0.00, 960.63, 0.00, …
## $ NetSpent              <dbl> 0.00, 4310.00, 1797.24, 1757.55, 334.40, 1545.41…
## $ AverageOrderValue     <dbl> 77183.6000, 615.7143, 449.3100, 1757.5500, 334.4…
## $ CustomerLifetimeDays  <dbl> 1.111111e-02, 3.650382e+02, 2.827528e+02, 0.0000…
## $ ReturnRate            <dbl> 1.000, 0.000, 0.000, 0.000, 0.000, 0.375, 0.000,…
## $ PrimaryCountry        <chr> "United Kingdom", "Iceland", "Finland", "Italy",…
## $ DaysSinceLastPurchase <dbl> 325.106250, 1.873611, 74.984028, 18.124306, 309.…

Structure check: Verify all expected columns are present and have correct data types.

# Preview first 10 customers
head(customer_summary, 10) %>%
  kable(caption = "Sample Customer Records") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), font_size = 11) %>%
  scroll_box(width = "100%")
Sample Customer Records
CustomerID FirstPurchaseDate LastPurchaseDate TotalTransactions TotalReturns TotalItemsPurchased TotalItemsReturned TotalSpent TotalReturnValue NetSpent AverageOrderValue CustomerLifetimeDays ReturnRate PrimaryCountry DaysSinceLastPurchase
12346 2011-01-18 10:01:00 2011-01-18 10:17:00 1 1 74215 74215 77183.60 77183.60 0.00 77183.6000 0.0111111 1.000 United Kingdom 325.106250
12347 2010-12-07 14:57:00 2011-12-07 15:52:00 7 0 2458 0 4310.00 0.00 4310.00 615.7143 365.0381944 0.000 Iceland 1.873611
12348 2010-12-16 19:09:00 2011-09-25 13:13:00 4 0 2341 0 1797.24 0.00 1797.24 449.3100 282.7527778 0.000 Finland 74.984028
12349 2011-11-21 09:51:00 2011-11-21 09:51:00 1 0 631 0 1757.55 0.00 1757.55 1757.5500 0.0000000 0.000 Italy 18.124306
12350 2011-02-02 16:01:00 2011-02-02 16:01:00 1 0 197 0 334.40 0.00 334.40 334.4000 0.0000000 0.000 Norway 309.867361
12352 2011-02-16 12:33:00 2011-11-03 14:37:00 8 3 536 66 2506.04 960.63 1545.41 313.2550 260.0861111 0.375 Norway 35.925694
12353 2011-05-19 17:47:00 2011-05-19 17:47:00 1 0 20 0 89.00 0.00 89.00 89.0000 0.0000000 0.000 Bahrain 203.793750
12354 2011-04-21 13:11:00 2011-04-21 13:11:00 1 0 530 0 1079.40 0.00 1079.40 1079.4000 0.0000000 0.000 Spain 231.985417
12355 2011-05-09 13:49:00 2011-05-09 13:49:00 1 0 240 0 459.40 0.00 459.40 459.4000 0.0000000 0.000 Bahrain 213.959028
12356 2011-01-18 09:50:00 2011-11-17 08:40:00 3 0 1591 0 2811.43 0.00 2811.43 937.1433 302.9513889 0.000 Portugal 22.173611

Data preview: Sample of customer records showing all calculated metrics.

Data Validation

# Check for missing values in critical RFM columns
cat("Missing Value Check:\n")
## Missing Value Check:
cat("DaysSinceLastPurchase:", sum(is.na(customer_summary$DaysSinceLastPurchase)), "\n")
## DaysSinceLastPurchase: 0
cat("TotalTransactions:", sum(is.na(customer_summary$TotalTransactions)), "\n")
## TotalTransactions: 0
cat("TotalSpent:", sum(is.na(customer_summary$TotalSpent)), "\n\n")
## TotalSpent: 0
# Check for invalid values (negative, zero, etc.)
cat("Data Range Validation:\n")
## Data Range Validation:
cat("Recency (days) range:", min(customer_summary$DaysSinceLastPurchase), "to", 
    max(customer_summary$DaysSinceLastPurchase), "\n")
## Recency (days) range: 0 to 373.1229
cat("Frequency range:", min(customer_summary$TotalTransactions), "to", 
    max(customer_summary$TotalTransactions), "\n")
## Frequency range: 0 to 210
cat("Monetary range: £", sprintf("%.2f", min(customer_summary$TotalSpent)), "to £", 
    sprintf("%.2f", max(customer_summary$TotalSpent)), "\n\n")
## Monetary range: £ 0.00 to £ 280206.02
# Check for outliers using IQR method
check_outliers <- function(x, metric_name) {
  Q1 <- quantile(x, 0.25)
  Q3 <- quantile(x, 0.75)
  IQR <- Q3 - Q1
  outliers <- sum(x < (Q1 - 1.5*IQR) | x > (Q3 + 1.5*IQR))
  cat(metric_name, "outliers:", outliers, 
      sprintf("(%.2f%%)", outliers/length(x)*100), "\n")
}

cat("Outlier Detection:\n")
## Outlier Detection:
check_outliers(customer_summary$DaysSinceLastPurchase, "Recency")
## Recency outliers: 145 (3.32%)
check_outliers(customer_summary$TotalTransactions, "Frequency")
## Frequency outliers: 285 (6.52%)
check_outliers(customer_summary$TotalSpent, "Monetary")
## Monetary outliers: 429 (9.81%)

Validation complete: Data quality checks ensure RFM metrics are valid and ready for analysis.

Summary Statistics

# Calculate summary statistics for RFM metrics
summary_table <- customer_summary %>%
  summarize(
    Metric = c("Recency (days)", "Frequency (transactions)", "Monetary (£)"),
    Mean = c(mean(DaysSinceLastPurchase), mean(TotalTransactions), mean(TotalSpent)),
    Median = c(median(DaysSinceLastPurchase), median(TotalTransactions), median(TotalSpent)),
    Min = c(min(DaysSinceLastPurchase), min(TotalTransactions), min(TotalSpent)),
    Max = c(max(DaysSinceLastPurchase), max(TotalTransactions), max(TotalSpent)),
    SD = c(sd(DaysSinceLastPurchase), sd(TotalTransactions), sd(TotalSpent))
  )

summary_table %>%
  mutate(across(where(is.numeric), ~round(., 2))) %>%
  kable(caption = "RFM Metrics Summary Statistics") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
RFM Metrics Summary Statistics
Metric Mean Median Min Max SD
Recency (days) 91.59 49.87 0 373.12 100.78
Frequency (transactions) 4.24 2.00 0 210.00 7.68
Monetary (£) 2033.22 659.46 0 280206.02 8953.00

Summary statistics: Overview of RFM metric distributions before scoring.


RFM Score Calculation

Now we’ll calculate RFM scores using the quintile method, dividing customers into 5 equal groups for each metric.

Recency Score Calculation

# Calculate Recency score (REVERSE: lower days = higher score)
# We use desc() because recent customers (low days) should get high scores
customer_rfm <- customer_summary %>%
  mutate(
    # Assign quintile scores 1-5 (5 = most recent)
    R_Score = ntile(desc(DaysSinceLastPurchase), 5)
  )

# Verify the scoring logic
recency_check <- customer_rfm %>%
  group_by(R_Score) %>%
  summarize(
    Customers = n(),
    Min_Days = min(DaysSinceLastPurchase),
    Max_Days = max(DaysSinceLastPurchase),
    Avg_Days = round(mean(DaysSinceLastPurchase), 1)
  ) %>%
  arrange(desc(R_Score))  # Show best (5) to worst (1)

recency_check %>%
  kable(caption = "Recency Score Distribution", 
        col.names = c("R Score", "Customers", "Min Days", "Max Days", "Avg Days")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Recency Score Distribution
R Score Customers Min Days Max Days Avg Days
5 874 0.00000 10.98889 5.1
4 874 10.99722 31.03403 21.0
3 874 31.04514 70.78958 49.1
2 874 70.80347 177.95556 113.8
1 875 178.03472 373.12292 268.7

Recency scoring complete: Customers divided into 5 groups. Score 5 = purchased most recently (best), Score 1 = purchased long ago (worst).

Key observation: Notice how R_Score 5 has the lowest “Avg Days” (most recent customers).

Frequency Score Calculation

# Calculate Frequency score (DIRECT: higher transactions = higher score)
customer_rfm <- customer_rfm %>%
  mutate(
    # Assign quintile scores 1-5 (5 = most frequent)
    F_Score = ntile(TotalTransactions, 5)
  )

# Verify the scoring logic
frequency_check <- customer_rfm %>%
  group_by(F_Score) %>%
  summarize(
    Customers = n(),
    Min_Transactions = min(TotalTransactions),
    Max_Transactions = max(TotalTransactions),
    Avg_Transactions = round(mean(TotalTransactions), 1)
  ) %>%
  arrange(desc(F_Score))  # Show best (5) to worst (1)

frequency_check %>%
  kable(caption = "Frequency Score Distribution",
        col.names = c("F Score", "Customers", "Min Txns", "Max Txns", "Avg Txns")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Frequency Score Distribution
F Score Customers Min Txns Max Txns Avg Txns
5 874 5 210 12.7
4 874 3 5 4.0
3 874 2 3 2.3
2 874 1 2 1.3
1 875 0 1 1.0

Frequency scoring complete: Score 5 = highest transaction count (most loyal), Score 1 = fewest transactions.

Key observation: F_Score 5 customers have significantly more transactions than lower scores.

Monetary Score Calculation

# Calculate Monetary score (DIRECT: higher spend = higher score)
customer_rfm <- customer_rfm %>%
  mutate(
    # Assign quintile scores 1-5 (5 = highest spend)
    M_Score = ntile(TotalSpent, 5)
  )

# Verify the scoring logic
monetary_check <- customer_rfm %>%
  group_by(M_Score) %>%
  summarize(
    Customers = n(),
    Min_Spend = min(TotalSpent),
    Max_Spend = max(TotalSpent),
    Avg_Spend = mean(TotalSpent),
    Total_Revenue = sum(TotalSpent)
  ) %>%
  arrange(desc(M_Score))  # Show best (5) to worst (1)

monetary_check %>%
  mutate(
    Min_Spend = dollar(Min_Spend, prefix = "£"),
    Max_Spend = dollar(Max_Spend, prefix = "£"),
    Avg_Spend = dollar(Avg_Spend, prefix = "£"),
    Total_Revenue = dollar(Total_Revenue, prefix = "£")
  ) %>%
  kable(caption = "Monetary Score Distribution",
        col.names = c("M Score", "Customers", "Min Spend", "Max Spend", "Avg Spend", "Total Revenue")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Monetary Score Distribution
M Score Customers Min Spend Max Spend Avg Spend Total Revenue
5 874 £2,045.53 £280,206 £7,608.24 £6,649,600
4 874 £924.85 £2,045 £1,388.15 £1,213,245
3 874 £481.49 £925 £676.34 £591,123
2 874 £242.35 £481 £351.47 £307,187
1 875 £0.00 £242 £144.06 £126,053

Monetary scoring complete: Score 5 = highest spenders (most valuable), Score 1 = lowest spenders.

Critical insight: Notice the revenue concentration. M_Score 5 customers likely contribute disproportionate revenue.

Combined RFM Scores

# Create combined RFM score and total score
customer_rfm <- customer_rfm %>%
  mutate(
    # Concatenate scores into 3-digit code (e.g., "543")
    RFM_Score = paste0(R_Score, F_Score, M_Score),
    
    # Calculate total score (sum of individual scores, range 3-15)
    RFM_Total = R_Score + F_Score + M_Score
  )

# Display score distribution summary
cat("RFM Score Creation Complete:\n")
## RFM Score Creation Complete:
cat("Total unique RFM combinations:", n_distinct(customer_rfm$RFM_Score), "\n")
## Total unique RFM combinations: 118
cat("RFM Total score range:", min(customer_rfm$RFM_Total), "to", max(customer_rfm$RFM_Total), "\n\n")
## RFM Total score range: 3 to 15
# Show most common RFM scores
cat("Top 10 Most Common RFM Scores:\n")
## Top 10 Most Common RFM Scores:
customer_rfm %>%
  count(RFM_Score, sort = TRUE) %>%
  head(10) %>%
  mutate(Percentage = round(n / nrow(customer_rfm) * 100, 2)) %>%
  kable(col.names = c("RFM Score", "Customers", "% of Total")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
RFM Score Customers % of Total
555 352 8.05
111 197 4.51
455 176 4.03
121 158 3.61
112 122 2.79
444 105 2.40
544 95 2.17
122 92 2.10
233 92 2.10
344 89 2.04

Combined RFM scores created: Each customer now has a unique 3-digit RFM code representing their behavior profile.

RFM Score Distribution Analysis

# Analyze the distribution of total RFM scores
rfm_total_dist <- customer_rfm %>%
  count(RFM_Total) %>%
  mutate(
    Percentage = round(n / sum(n) * 100, 2),
    Cumulative_Pct = round(cumsum(n) / sum(n) * 100, 2)
  )

rfm_total_dist %>%
  kable(caption = "Distribution of Total RFM Scores",
        col.names = c("Total Score", "Customers", "% of Total", "Cumulative %")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Distribution of Total RFM Scores
Total Score Customers % of Total Cumulative %
3 197 4.51 4.51
4 364 8.33 12.83
5 344 7.87 20.70
6 404 9.24 29.95
7 386 8.83 38.78
8 383 8.76 47.54
9 343 7.85 55.39
10 347 7.94 63.33
11 357 8.17 71.49
12 303 6.93 78.43
13 298 6.82 85.24
14 293 6.70 91.95
15 352 8.05 100.00

Score distribution: Shows how customers are distributed across the RFM score spectrum.

# Preview customers with their RFM scores
customer_rfm %>%
  select(CustomerID, DaysSinceLastPurchase, TotalTransactions, TotalSpent,
         R_Score, F_Score, M_Score, RFM_Score, RFM_Total) %>%
  arrange(desc(RFM_Total)) %>%
  head(20) %>%
  mutate(TotalSpent = dollar(TotalSpent, prefix = "£")) %>%
  kable(caption = "Top 20 Customers by RFM Total Score") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), font_size = 11) %>%
  scroll_box(width = "100%")
Top 20 Customers by RFM Total Score
CustomerID DaysSinceLastPurchase TotalTransactions TotalSpent R_Score F_Score M_Score RFM_Score RFM_Total
12347 1.8736111 7 £4,310.00 5 5 5 555 15
12362 2.8819444 10 £5,226.23 5 5 5 555 15
12417 2.9152778 9 £3,649.10 5 5 5 555 15
12433 0.1166667 7 £13,375.87 5 5 5 555 15
12437 1.0520833 18 £4,951.41 5 5 5 555 15
12471 1.8798611 30 £19,788.65 5 5 5 555 15
12476 0.9458333 11 £6,816.42 5 5 5 555 15
12490 4.8631944 10 £5,417.93 5 5 5 555 15
12524 8.8881944 8 £4,485.72 5 5 5 555 15
12553 7.8416667 10 £3,692.28 5 5 5 555 15
12562 7.8125000 7 £3,781.74 5 5 5 555 15
12569 1.8513889 32 £4,124.69 5 5 5 555 15
12583 2.1965278 15 £7,281.38 5 5 5 555 15
12584 3.1354167 9 £2,338.15 5 5 5 555 15
12598 9.0222222 8 £3,023.08 5 5 5 555 15
12621 1.0590278 20 £13,689.67 5 5 5 555 15
12627 10.0298611 7 £4,478.53 5 5 5 555 15
12662 0.0354167 11 £3,849.78 5 5 5 555 15
12664 7.8215278 9 £4,881.88 5 5 5 555 15
12682 3.1180556 31 £12,279.82 5 5 5 555 15

Best customers preview: Top-scoring customers based on combined RFM metrics.


Customer Segmentation

Now we’ll assign customers to meaningful business segments based on their RFM score patterns.

Segment Definitions

We’ll create 11 distinct customer segments based on proven RFM segmentation methodology:

# Define segment assignment based on RFM score patterns
customer_rfm <- customer_rfm %>%
  mutate(
    Segment = case_when(
      # Champions: Best customers (bought recently, buy often, spend most)
      RFM_Score %in% c("555", "554", "544", "545", "454", "455", "445") ~ "Champions",
      
      # Loyal Customers: Regular, reliable customers
      RFM_Score %in% c("543", "444", "435", "355", "354", "345", "344", "335") ~ "Loyal Customers",
      
      # Potential Loyalists: Recent customers with average frequency/spend
      RFM_Score %in% c("553", "551", "552", "541", "542", "533", "532", "531", 
                       "452", "451", "442", "441", "431", "453", "433", "432", 
                       "423", "353", "352", "351", "342", "341", "333", "323") ~ "Potential Loyalists",
      
      # New Customers: Recent first-time buyers
      RFM_Score %in% c("512", "511", "422", "421", "412", "411", "311") ~ "New Customers",
      
      # Promising: Recent but low frequency/spend
      RFM_Score %in% c("525", "524", "523", "522", "521", "515", "514", "513",
                       "425", "424", "413", "414", "415", "315", "314", "313") ~ "Promising",
      
      # Need Attention: Good customers who haven't purchased recently
      RFM_Score %in% c("535", "534", "443", "434", "343", "334", "325", "324") ~ "Need Attention",
      
      # About to Sleep: Below average recency, frequency, monetary
      RFM_Score %in% c("331", "321", "312", "221", "213", "231", "241", "251") ~ "About to Sleep",
      
      # At Risk: Spent big, purchased often, but long time ago
      RFM_Score %in% c("255", "254", "245", "244", "253", "252", "243", "242",
                       "235", "234", "225", "224", "153", "152", "145", "143",
                       "142", "135", "134", "133", "125", "124") ~ "At Risk",
      
      # Can't Lose Them: Made big purchases, haven't returned
      RFM_Score %in% c("155", "154", "144", "214", "215", "115", "114", "113") ~ "Can't Lose Them",
      
      # Hibernating: Last purchase long ago, low spend/frequency
      RFM_Score %in% c("332", "322", "233", "232", "223", "222", "132", "123", "122", "212", "211") ~ "Hibernating",
      
      # Lost: Lowest recency, frequency, monetary
      RFM_Score %in% c("111", "112", "121", "131", "141", "151") ~ "Lost",
      
      # Catch-all (should be minimal)
      TRUE ~ "Other"
    )
  )

# Verify segmentation coverage
cat("Segmentation Complete!\n")
## Segmentation Complete!
cat("Customers assigned to 'Other':", sum(customer_rfm$Segment == "Other"), "\n")
## Customers assigned to 'Other': 0
cat("This should be 0 or very close to 0.\n")
## This should be 0 or very close to 0.

Segmentation complete: All customers assigned to actionable business segments.

Important: If “Other” count is high, we need to review segment definitions.

Segment Distribution Overview

# Calculate segment sizes and percentages
segment_dist <- customer_rfm %>%
  count(Segment, name = "Customers") %>%
  mutate(
    Percentage = round(Customers / sum(Customers) * 100, 2)
  ) %>%
  arrange(desc(Customers))

segment_dist %>%
  kable(caption = "Customer Distribution by Segment",
        col.names = c("Segment", "Customers", "% of Total")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Customer Distribution by Segment
Segment Customers % of Total
Champions 848 19.40
Hibernating 703 16.08
Lost 500 11.44
At Risk 428 9.79
Potential Loyalists 415 9.49
Loyal Customers 411 9.40
Need Attention 291 6.66
About to Sleep 276 6.31
New Customers 266 6.09
Promising 137 3.13
Can’t Lose Them 96 2.20

Segment sizes: Distribution of customers across all segments.

Business question: Which segments are largest? Does this match expected e-commerce patterns?


Initial Segment Analysis

Now we’ll analyze the characteristics and value of each segment.

Comprehensive Segment Summary

# Calculate detailed metrics for each segment
segment_summary <- customer_rfm %>%
  group_by(Segment) %>%
  summarize(
    # Customer counts
    Customers = n(),
    Pct_Customers = round(n() / nrow(customer_rfm) * 100, 2),
    
    # RFM averages
    Avg_Recency_Days = round(mean(DaysSinceLastPurchase), 1),
    Avg_Frequency = round(mean(TotalTransactions), 1),
    Avg_Monetary = round(mean(TotalSpent), 2),
    
    # Revenue metrics
    Total_Revenue = sum(TotalSpent),
    Pct_Revenue = round(sum(TotalSpent) / sum(customer_rfm$TotalSpent) * 100, 2),
    
    # Additional metrics
    Avg_Order_Value = round(mean(AverageOrderValue), 2),
    Avg_Customer_Lifetime_Days = round(mean(CustomerLifetimeDays), 0)
  ) %>%
  arrange(desc(Total_Revenue))  # Sort by revenue contribution

# Display formatted table
segment_summary %>%
  mutate(
    Avg_Monetary = dollar(Avg_Monetary, prefix = "£"),
    Total_Revenue = dollar(Total_Revenue, prefix = "£"),
    Avg_Order_Value = dollar(Avg_Order_Value, prefix = "£")
  ) %>%
  kable(caption = "Comprehensive Segment Analysis") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), font_size = 10) %>%
  scroll_box(width = "100%")
Comprehensive Segment Analysis
Segment Customers Pct_Customers Avg_Recency_Days Avg_Frequency Avg_Monetary Total_Revenue Pct_Revenue Avg_Order_Value Avg_Customer_Lifetime_Days
Champions 848 19.40 10.1 12.0 £6,634.23 £5,625,826 63.30 £483.07 290
Loyal Customers 411 9.40 35.8 5.4 £2,361.91 £970,745 10.92 £427.31 238
At Risk 428 9.79 140.6 3.7 £1,744.75 £746,752 8.40 £556.33 149
Need Attention 291 6.66 30.5 3.1 £1,665.87 £484,768 5.45 £699.99 169
Hibernating 703 16.08 146.8 1.5 £399.18 £280,623 3.16 £281.56 58
Potential Loyalists 415 9.49 25.0 2.5 £538.06 £223,294 2.51 £248.35 149
Can’t Lose Them 96 2.20 230.0 2.3 £2,162.30 £207,580 2.34 £1,766.60 34
Promising 137 3.13 22.8 1.3 £891.10 £122,081 1.37 £723.99 61
Lost 500 11.44 276.6 1.0 £187.13 £93,564 1.05 £182.12 6
About to Sleep 276 6.31 86.5 1.3 £268.41 £74,081 0.83 £248.87 24
New Customers 266 6.09 26.7 1.1 £217.65 £57,895 0.65 £207.32 15

Key findings from segment analysis:

Interpretation from the table above: - Which segment drives the most revenue? Champions drive the most revenue - What’s the revenue concentration? (Do top 2-3 segments drive majority of revenue?) The revenue is concentrated between Champions and Loyal Customers - How does customer count relate to revenue contribution? We see a general trend of higher number of customers being correlated with higher revenue - Which segments show high recency (engaged) vs low recency (at-risk)? Champions show the highest high recency and Lost show the lowest recency

Revenue Concentration Analysis

# Calculate cumulative revenue by segment (Pareto analysis)
revenue_pareto <- segment_summary %>%
  arrange(desc(Total_Revenue)) %>%
  mutate(
    Cumulative_Revenue_Pct = round(cumsum(Total_Revenue) / sum(Total_Revenue) * 100, 2),
    Cumulative_Customers_Pct = round(cumsum(Customers) / sum(Customers) * 100, 2)
  ) %>%
  select(Segment, Customers, Pct_Customers, Total_Revenue, Pct_Revenue, 
         Cumulative_Customers_Pct, Cumulative_Revenue_Pct)

revenue_pareto %>%
  mutate(Total_Revenue = dollar(Total_Revenue, prefix = "£")) %>%
  kable(caption = "Revenue Concentration (Pareto Analysis)") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), font_size = 10) %>%
  scroll_box(width = "100%")
Revenue Concentration (Pareto Analysis)
Segment Customers Pct_Customers Total_Revenue Pct_Revenue Cumulative_Customers_Pct Cumulative_Revenue_Pct
Champions 848 19.40 £5,625,826 63.30 19.40 63.30
Loyal Customers 411 9.40 £970,745 10.92 28.80 74.23
At Risk 428 9.79 £746,752 8.40 38.60 82.63
Need Attention 291 6.66 £484,768 5.45 45.25 88.08
Hibernating 703 16.08 £280,623 3.16 61.34 91.24
Potential Loyalists 415 9.49 £223,294 2.51 70.83 93.75
Can’t Lose Them 96 2.20 £207,580 2.34 73.03 96.09
Promising 137 3.13 £122,081 1.37 76.16 97.46
Lost 500 11.44 £93,564 1.05 87.60 98.51
About to Sleep 276 6.31 £74,081 0.83 93.91 99.35
New Customers 266 6.09 £57,895 0.65 100.00 100.00

Segment Characteristics Matrix

# Create a matrix showing relative segment characteristics
segment_matrix <- customer_rfm %>%
  group_by(Segment) %>%
  summarize(
    Avg_R_Score = round(mean(R_Score), 2),
    Avg_F_Score = round(mean(F_Score), 2),
    Avg_M_Score = round(mean(M_Score), 2),
    Avg_Total_Score = round(mean(RFM_Total), 2)
  ) %>%
  arrange(desc(Avg_Total_Score))

segment_matrix %>%
  kable(caption = "Average RFM Scores by Segment",
        col.names = c("Segment", "Avg R", "Avg F", "Avg M", "Avg Total")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Average RFM Scores by Segment
Segment Avg R Avg F Avg M Avg Total
Champions 4.67 4.79 4.72 14.18
Loyal Customers 3.48 4.24 4.24 11.95
Need Attention 3.69 3.43 3.53 10.65
Potential Loyalists 3.98 3.08 2.51 9.58
At Risk 1.74 3.64 3.80 9.18
Promising 4.21 1.55 3.06 8.82
Can’t Lose Them 1.19 1.81 3.76 6.76
New Customers 3.90 1.32 1.35 6.56
Hibernating 1.90 2.15 2.14 6.18
About to Sleep 2.51 1.88 1.47 5.86
Lost 1.00 1.41 1.24 3.66

Segment profiles: Average RFM scores show the behavioral profile of each segment.

Top Customers by Segment

# Show top 3 customers from key segments
key_segments <- c("Champions", "Loyal Customers", "At Risk", "Lost")

for(seg in key_segments) {
  cat("\n### Top 3", seg, ":\n")
  
  customer_rfm %>%
    filter(Segment == seg) %>%
    arrange(desc(TotalSpent)) %>%
    head(3) %>%
    select(CustomerID, RFM_Score, DaysSinceLastPurchase, TotalTransactions, 
           TotalSpent, PrimaryCountry) %>%
    mutate(TotalSpent = dollar(TotalSpent, prefix = "£")) %>%
    kable() %>%
    kable_styling(bootstrap_options = c("striped", "hover"), font_size = 10) %>%
    print()
  
  cat("\n")
}
## 
## ### Top 3 Champions :
## <table class="table table-striped table-hover" style="font-size: 10px; margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:right;"> CustomerID </th>
##    <th style="text-align:left;"> RFM_Score </th>
##    <th style="text-align:right;"> DaysSinceLastPurchase </th>
##    <th style="text-align:right;"> TotalTransactions </th>
##    <th style="text-align:left;"> TotalSpent </th>
##    <th style="text-align:left;"> PrimaryCountry </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:right;"> 14646 </td>
##    <td style="text-align:left;"> 555 </td>
##    <td style="text-align:right;"> 1.0263889 </td>
##    <td style="text-align:right;"> 73 </td>
##    <td style="text-align:left;"> £280,206 </td>
##    <td style="text-align:left;"> Netherlands </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 18102 </td>
##    <td style="text-align:left;"> 555 </td>
##    <td style="text-align:right;"> 0.0416667 </td>
##    <td style="text-align:right;"> 60 </td>
##    <td style="text-align:left;"> £259,657 </td>
##    <td style="text-align:left;"> United Kingdom </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 17450 </td>
##    <td style="text-align:left;"> 555 </td>
##    <td style="text-align:right;"> 7.9729167 </td>
##    <td style="text-align:right;"> 46 </td>
##    <td style="text-align:left;"> £194,391 </td>
##    <td style="text-align:left;"> United Kingdom </td>
##   </tr>
## </tbody>
## </table>
## 
## ### Top 3 Loyal Customers :
## <table class="table table-striped table-hover" style="font-size: 10px; margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:right;"> CustomerID </th>
##    <th style="text-align:left;"> RFM_Score </th>
##    <th style="text-align:right;"> DaysSinceLastPurchase </th>
##    <th style="text-align:right;"> TotalTransactions </th>
##    <th style="text-align:left;"> TotalSpent </th>
##    <th style="text-align:left;"> PrimaryCountry </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:right;"> 16029 </td>
##    <td style="text-align:left;"> 355 </td>
##    <td style="text-align:right;"> 38.09931 </td>
##    <td style="text-align:right;"> 63 </td>
##    <td style="text-align:left;"> £80,850.84 </td>
##    <td style="text-align:left;"> United Kingdom </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 12744 </td>
##    <td style="text-align:left;"> 355 </td>
##    <td style="text-align:right;"> 51.06389 </td>
##    <td style="text-align:right;"> 7 </td>
##    <td style="text-align:left;"> £21,279.29 </td>
##    <td style="text-align:left;"> Singapore </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 12678 </td>
##    <td style="text-align:left;"> 355 </td>
##    <td style="text-align:right;"> 41.99028 </td>
##    <td style="text-align:right;"> 12 </td>
##    <td style="text-align:left;"> £17,628.46 </td>
##    <td style="text-align:left;"> France </td>
##   </tr>
## </tbody>
## </table>
## 
## ### Top 3 At Risk :
## <table class="table table-striped table-hover" style="font-size: 10px; margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:right;"> CustomerID </th>
##    <th style="text-align:left;"> RFM_Score </th>
##    <th style="text-align:right;"> DaysSinceLastPurchase </th>
##    <th style="text-align:right;"> TotalTransactions </th>
##    <th style="text-align:left;"> TotalSpent </th>
##    <th style="text-align:left;"> PrimaryCountry </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:right;"> 15749 </td>
##    <td style="text-align:left;"> 145 </td>
##    <td style="text-align:right;"> 234.97917 </td>
##    <td style="text-align:right;"> 3 </td>
##    <td style="text-align:left;"> £44,534.30 </td>
##    <td style="text-align:left;"> United Kingdom </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 15098 </td>
##    <td style="text-align:left;"> 135 </td>
##    <td style="text-align:right;"> 181.88264 </td>
##    <td style="text-align:right;"> 3 </td>
##    <td style="text-align:left;"> £39,916.50 </td>
##    <td style="text-align:left;"> United Kingdom </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 12409 </td>
##    <td style="text-align:left;"> 235 </td>
##    <td style="text-align:right;"> 78.09167 </td>
##    <td style="text-align:right;"> 3 </td>
##    <td style="text-align:left;"> £11,072.67 </td>
##    <td style="text-align:left;"> Switzerland </td>
##   </tr>
## </tbody>
## </table>
## 
## ### Top 3 Lost :
## <table class="table table-striped table-hover" style="font-size: 10px; margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:right;"> CustomerID </th>
##    <th style="text-align:left;"> RFM_Score </th>
##    <th style="text-align:right;"> DaysSinceLastPurchase </th>
##    <th style="text-align:right;"> TotalTransactions </th>
##    <th style="text-align:left;"> TotalSpent </th>
##    <th style="text-align:left;"> PrimaryCountry </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:right;"> 14130 </td>
##    <td style="text-align:left;"> 112 </td>
##    <td style="text-align:right;"> 318.8646 </td>
##    <td style="text-align:right;"> 1 </td>
##    <td style="text-align:left;"> £480.91 </td>
##    <td style="text-align:left;"> United Kingdom </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 12447 </td>
##    <td style="text-align:left;"> 112 </td>
##    <td style="text-align:right;"> 242.9757 </td>
##    <td style="text-align:right;"> 1 </td>
##    <td style="text-align:left;"> £476.49 </td>
##    <td style="text-align:left;"> Belgium </td>
##   </tr>
##   <tr>
##    <td style="text-align:right;"> 14497 </td>
##    <td style="text-align:left;"> 112 </td>
##    <td style="text-align:right;"> 300.8958 </td>
##    <td style="text-align:right;"> 1 </td>
##    <td style="text-align:left;"> £475.37 </td>
##    <td style="text-align:left;"> United Kingdom </td>
##   </tr>
## </tbody>
## </table>

Sample customers: Examples from each key segment showing their characteristics.


Segment Comparison

Let’s compare key segments side-by-side to understand differences.

Champions vs Lost Customers

# Direct comparison of best vs worst segments
comparison <- customer_rfm %>%
  filter(Segment %in% c("Champions", "Lost")) %>%
  group_by(Segment) %>%
  summarize(
    Customers = n(),
    Avg_Recency = round(mean(DaysSinceLastPurchase), 1),
    Avg_Frequency = round(mean(TotalTransactions), 1),
    Avg_Monetary = round(mean(TotalSpent), 2),
    Total_Revenue = sum(TotalSpent),
    Avg_Lifetime_Days = round(mean(CustomerLifetimeDays), 0)
  )

comparison %>%
  mutate(
    Avg_Monetary = dollar(Avg_Monetary, prefix = "£"),
    Total_Revenue = dollar(Total_Revenue, prefix = "£")
  ) %>%
  kable(caption = "Champions vs Lost Customers Comparison") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Champions vs Lost Customers Comparison
Segment Customers Avg_Recency Avg_Frequency Avg_Monetary Total_Revenue Avg_Lifetime_Days
Champions 848 10.1 12 £6,634.23 £5,625,826 290
Lost 500 276.6 1 £187.13 £93,564 6

Stark contrast: This comparison shows the dramatic difference between best and worst customers.

At-Risk vs Loyal Customers

# Compare at-risk customers with loyal ones
at_risk_comparison <- customer_rfm %>%
  filter(Segment %in% c("Loyal Customers", "At Risk")) %>%
  group_by(Segment) %>%
  summarize(
    Customers = n(),
    Avg_Recency = round(mean(DaysSinceLastPurchase), 1),
    Avg_Frequency = round(mean(TotalTransactions), 1),
    Avg_Monetary = round(mean(TotalSpent), 2),
    Total_Revenue_At_Stake = sum(TotalSpent)
  )

at_risk_comparison %>%
  mutate(
    Avg_Monetary = dollar(Avg_Monetary, prefix = "£"),
    Total_Revenue_At_Stake = dollar(Total_Revenue_At_Stake, prefix = "£")
  ) %>%
  kable(caption = "At Risk vs Loyal Customers") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
At Risk vs Loyal Customers
Segment Customers Avg_Recency Avg_Frequency Avg_Monetary Total_Revenue_At_Stake
At Risk 428 140.6 3.7 £1,744.75 £746,752
Loyal Customers 411 35.8 5.4 £2,361.91 £970,745

Risk assessment: At-Risk customers were once valuable. This shows what’s at stake if we lose them.


Key Findings Summary

Based on Day 1 analysis, here are our initial findings:

top_3_segments <- segment_summary %>%
  arrange(desc(Customers)) %>%
  head(3)

cat("\nTop 3 Segments by Customer Count:\n")
## 
## Top 3 Segments by Customer Count:
for(i in 1:3) {
  cat(sprintf("%d. %s: %d customers (%.1f%%)\n", 
              i, top_3_segments$Segment[i], top_3_segments$Customers[i], 
              top_3_segments$Pct_Customers[i]))
}
## 1. Champions: 848 customers (19.4%)
## 2. Hibernating: 703 customers (16.1%)
## 3. Lost: 500 customers (11.4%)
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

Revenue Concentration Insights

cat("REVENUE CONCENTRATION:\n")
## REVENUE CONCENTRATION:
cat(rep("=", 60), "\n", sep = "")
## ============================================================
top_revenue_segments <- segment_summary %>%
  arrange(desc(Total_Revenue)) %>%
  head(3)

cat("\nTop 3 Segments by Revenue Contribution:\n")
## 
## Top 3 Segments by Revenue Contribution:
for(i in 1:3) {
  cat(sprintf("%d. %s: %s (%.1f%% of total revenue)\n", 
              i, top_revenue_segments$Segment[i], 
              dollar(top_revenue_segments$Total_Revenue[i], prefix = "£"),
              top_revenue_segments$Pct_Revenue[i]))
}
## 1. Champions: £5,625,826 (63.3% of total revenue)
## 2. Loyal Customers: £970,745 (10.9% of total revenue)
## 3. At Risk: £746,752 (8.4% of total revenue)
# Calculate concentration ratio
top_2_pct <- sum(top_revenue_segments$Pct_Revenue[1:2])
cat(sprintf("\nTop 2 segments drive %.1f%% of revenue\n", top_2_pct))
## 
## Top 2 segments drive 74.2% of revenue
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

Risk Assessment

cat("RISK ASSESSMENT:\n")
## RISK ASSESSMENT:
cat(rep("=", 60), "\n", sep = "")
## ============================================================
# At-risk and churned customers
at_risk_total <- segment_summary %>%
  filter(Segment %in% c("At Risk", "Can't Lose Them")) %>%
  summarize(
    Total_Customers = sum(Customers),
    Total_Revenue = sum(Total_Revenue),
    Pct_Revenue = sum(Pct_Revenue)
  )

churned_total <- segment_summary %>%
  filter(Segment %in% c("Hibernating", "Lost")) %>%
  summarize(
    Total_Customers = sum(Customers),
    Pct_Customers = sum(Pct_Customers)
  )

cat("\nHigh-Value Customers at Risk:\n")
## 
## High-Value Customers at Risk:
cat(sprintf("  Customers: %d\n", at_risk_total$Total_Customers))
##   Customers: 524
cat(sprintf("  Revenue at stake: %s (%.1f%% of total)\n", 
            dollar(at_risk_total$Total_Revenue, prefix = "£"),
            at_risk_total$Pct_Revenue))
##   Revenue at stake: £954,332 (10.7% of total)
cat("\nChurned/Hibernating Customers:\n")
## 
## Churned/Hibernating Customers:
cat(sprintf("  Customers: %d (%.1f%% of customer base)\n", 
            churned_total$Total_Customers,
            churned_total$Pct_Customers))
##   Customers: 1203 (27.5% of customer base)
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

Opportunity Assessment

cat("GROWTH OPPORTUNITIES:\n")
## GROWTH OPPORTUNITIES:
cat(rep("=", 60), "\n", sep = "")
## ============================================================
# Growth segments
growth_segments <- segment_summary %>%
  filter(Segment %in% c("Potential Loyalists", "Promising", "New Customers"))

cat("\nCustomers with Growth Potential:\n")
## 
## Customers with Growth Potential:
for(i in 1:nrow(growth_segments)) {
  cat(sprintf("  %s: %d customers (%.1f%% of base)\n",
              growth_segments$Segment[i],
              growth_segments$Customers[i],
              growth_segments$Pct_Customers[i]))
}
##   Potential Loyalists: 415 customers (9.5% of base)
##   Promising: 137 customers (3.1% of base)
##   New Customers: 266 customers (6.1% of base)
cat(sprintf("\nTotal growth opportunity: %d customers (%.1f%% of base)\n",
            sum(growth_segments$Customers),
            sum(growth_segments$Pct_Customers)))
## 
## Total growth opportunity: 818 customers (18.7% of base)
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

Initial findings documented: These insights will guide Day 2 visualizations and Day 3 recommendations.


Data Export for Visualization

Save the RFM-scored dataset for Day 2 visualization work.

# Export the complete RFM dataset
write_csv(customer_rfm, here("data", "processed", "customer_rfm_scored.csv"))
cat("RFM-scored data exported to: data/processed/customer_rfm_scored.csv\n")
## RFM-scored data exported to: data/processed/customer_rfm_scored.csv
# Export segment summary
write_csv(segment_summary, here("data", "processed", "rfm_segment_summary.csv"))
cat("Segment summary exported to: data/processed/rfm_segment_summary.csv\n")
## Segment summary exported to: data/processed/rfm_segment_summary.csv

Export complete: Data ready for Day 2 visualization work.


Next Steps

Day 1 Complete! ✓

We’ve successfully: - ✅ Calculated RFM scores for all customers - ✅ Assigned customers to 11 business segments - ✅ Analyzed segment characteristics and revenue distribution - ✅ Identified key risks and opportunities

Day 2 Preview:

Tomorrow we’ll create compelling visualizations including: 1. Customer distribution by segment (bar chart) 2. Revenue contribution comparison (stacked bar) 3. RFM 3D scatter plot (interactive) 4. Score heatmap 5. Segment radar charts 6. Pareto analysis 7. Customer lifecycle flow

Day 3 Preview:

Final day will focus on: - Business recommendations for each segment - Marketing action plans - Budget allocation strategy - Expected ROI calculations - Segment customer list exports


DAY 2: VISUALIZATIONS & DEEP DIVE ANALYSIS


Core RFM Visualizations

Now we’ll create compelling visualizations to communicate our RFM findings to stakeholders.

Visualization 1: Customer Distribution by Segment

This shows how many customers are in each segment and helps identify the size of each group.

# Prepare data for visualization
segment_dist_plot <- segment_summary %>%
  arrange(desc(Customers)) %>%
  mutate(
    Segment = factor(Segment, levels = Segment),  # Preserve order
    # Color code by segment type
    Segment_Type = case_when(
      Segment %in% c("Champions", "Loyal Customers") ~ "High Value",
      Segment %in% c("Potential Loyalists", "Promising", "New Customers") ~ "Growth Opportunity",
      Segment %in% c("Need Attention", "About to Sleep") ~ "Needs Engagement",
      Segment %in% c("At Risk", "Can't Lose Them") ~ "At Risk",
      TRUE ~ "Low Value/Churned"
    )
  )

# Create horizontal bar chart
ggplot(segment_dist_plot, aes(x = Customers, y = Segment, fill = Segment_Type)) +
  geom_col() +
  geom_text(aes(label = paste0(Customers, " (", Pct_Customers, "%)")), 
            hjust = -0.1, size = 3.5) +
  scale_fill_manual(
    values = c(
      "High Value" = "#2E7D32",
      "Growth Opportunity" = "#1976D2", 
      "Needs Engagement" = "#F57C00",
      "At Risk" = "#D32F2F",
      "Low Value/Churned" = "#757575"
    )
  ) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(
    title = "Customer Distribution by RFM Segment",
    subtitle = "Breakdown of customer base across 11 strategic segments",
    x = "Number of Customers",
    y = NULL,
    fill = "Segment Category",
    caption = "Source: RFM Analysis of Customer Transaction Data"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "gray40"),
    legend.position = "bottom",
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank()
  )

Key insights from customer distribution:

Interpret this visualization: - Which segments have the most customers? The top three segments with most costumers are Champions, Hibernating, Lost - Are growth opportunity segments substantial? The only growth opportunity groups that seems substantial is the Potential Loyalists - What % of customers are in at-risk or churned categories? Approximatelly 40%


Visualization 2: Revenue Contribution vs Customer Count

This reveals the critical insight: which segments drive revenue vs how many customers they represent.

# Prepare data comparing customer % to revenue %
revenue_comparison <- segment_summary %>%
  select(Segment, Pct_Customers, Pct_Revenue) %>%
  pivot_longer(
    cols = c(Pct_Customers, Pct_Revenue),
    names_to = "Metric",
    values_to = "Percentage"
  ) %>%
  mutate(
    Metric = ifelse(Metric == "Pct_Customers", "% of Customers", "% of Revenue"),
    Segment = factor(Segment, levels = segment_summary$Segment)  # Same order as previous
  )

# Create grouped bar chart
ggplot(revenue_comparison, aes(x = Percentage, y = Segment, fill = Metric)) +
  geom_col(position = "dodge", width = 0.7) +
  geom_text(
    aes(label = sprintf("%.1f%%", Percentage)),
    position = position_dodge(width = 0.7),
    hjust = -0.1,
    size = 3
  ) +
  scale_fill_manual(
    values = c("% of Customers" = "#1976D2", "% of Revenue" = "#2E7D32")
  ) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.2))) +
  labs(
    title = "Revenue Concentration: Customer Count vs Revenue Contribution",
    subtitle = "Identifying segments with disproportionate revenue impact",
    x = "Percentage (%)",
    y = NULL,
    fill = "Metric",
    caption = "Green bars > Blue bars indicate high-value segments"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "gray40"),
    legend.position = "bottom",
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank()
  )

Revenue concentration insights:

Look for segments where: - Green bar >> Blue bar: High-value segments (few customers, lots of revenue) - Blue bar >> Green bar: Large but low-value segments - Bars similar: Proportionate contribution


Visualization 3: RFM Score Heatmap

Shows the distribution of customers across all RFM score combinations.

# Create frequency table of R and F scores (using M as color intensity)
rfm_heatmap_data <- customer_rfm %>%
  group_by(R_Score, F_Score) %>%
  summarize(
    Customer_Count = n(),
    Avg_Monetary = mean(M_Score),
    Total_Revenue = sum(TotalSpent),
    .groups = "drop"
  )

# Create heatmap
ggplot(rfm_heatmap_data, aes(x = factor(F_Score), y = factor(R_Score), fill = Customer_Count)) +
  geom_tile(color = "white", linewidth = 1) +
  geom_text(aes(label = Customer_Count), color = "white", fontface = "bold", size = 4) +
  scale_fill_viridis_c(option = "plasma", direction = -1) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  labs(
    title = "RFM Score Distribution Heatmap",
    subtitle = "Customer concentration across Recency and Frequency dimensions",
    x = "Frequency Score (1=Low, 5=High)",
    y = "Recency Score (1=Long Ago, 5=Recent)",
    fill = "Number of\nCustomers",
    caption = "Cell intensity shows customer count | Top-right (5,5) = Champions"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "gray40"),
    panel.grid = element_blank(),
    axis.text = element_text(size = 11)
  )

Heatmap insights:

  • Top-right corner (5,5): Champions - recent and frequent buyers
  • Bottom-left corner (1,1): Lost customers - inactive and rare buyers
  • Clusters: Show common customer behavior patterns

Visualization 4: Segment Comparison Radar Chart

Compares the behavioral profiles of key segments across RFM dimensions.

# Prepare data for radar chart (normalize scores to 0-100 scale)
radar_data <- customer_rfm %>%
  filter(Segment %in% c("Champions", "Loyal Customers", "At Risk", "Lost", "New Customers")) %>%
  group_by(Segment) %>%
  summarize(
    Recency = mean(R_Score) * 20,      # Convert 1-5 to 20-100
    Frequency = mean(F_Score) * 20,
    Monetary = mean(M_Score) * 20,
    .groups = "drop"
  ) %>%
  pivot_longer(
    cols = c(Recency, Frequency, Monetary),
    names_to = "Metric",
    values_to = "Score"
  )

# Create faceted radar-style chart
ggplot(radar_data, aes(x = Metric, y = Score, group = Segment, color = Segment)) +
  geom_polygon(aes(fill = Segment), alpha = 0.2, linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(limits = c(0, 100), breaks = seq(0, 100, 25)) +
  coord_polar() +
  facet_wrap(~Segment, ncol = 3) +
  scale_color_manual(
    values = c(
      "Champions" = "#2E7D32",
      "Loyal Customers" = "#1976D2",
      "At Risk" = "#D32F2F",
      "Lost" = "#757575",
      "New Customers" = "#F57C00"
    )
  ) +
  scale_fill_manual(
    values = c(
      "Champions" = "#2E7D32",
      "Loyal Customers" = "#1976D2",
      "At Risk" = "#D32F2F",
      "Lost" = "#757575",
      "New Customers" = "#F57C00"
    )
  ) +
  labs(
    title = "RFM Profile Comparison: Key Customer Segments",
    subtitle = "Behavioral characteristics across Recency, Frequency, and Monetary dimensions",
    x = NULL,
    y = "Score (0-100)",
    caption = "Larger shapes indicate stronger performance across RFM metrics"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "gray40"),
    legend.position = "none",
    strip.text = element_text(face = "bold", size = 12)
  )

Profile insights:

  • Champions: Should show high scores across all three metrics (large triangle)
  • At Risk: High Frequency/Monetary but low Recency (lopsided shape)
  • New Customers: High Recency but low Frequency/Monetary
  • Lost: Low across all dimensions (small triangle)

Visualization 5: Revenue Pareto Chart

The classic 80/20 analysis showing cumulative revenue by customer rank.

# Calculate cumulative revenue by customer
pareto_data <- customer_rfm %>%
  arrange(desc(TotalSpent)) %>%
  mutate(
    Customer_Rank = row_number(),
    Customer_Percentile = (Customer_Rank / n()) * 100,
    Cumulative_Revenue = cumsum(TotalSpent),
    Cumulative_Revenue_Pct = (Cumulative_Revenue / sum(TotalSpent)) * 100
  )

# Find 80% point
point_80 <- pareto_data %>%
  filter(Cumulative_Revenue_Pct >= 80) %>%
  slice(1)

# Create Pareto chart
ggplot(pareto_data, aes(x = Customer_Percentile, y = Cumulative_Revenue_Pct)) +
  geom_line(color = "#1976D2", linewidth = 1.5) +
  geom_area(fill = "#1976D2", alpha = 0.2) +
  geom_hline(yintercept = 80, linetype = "dashed", color = "#D32F2F", linewidth = 1) +
  geom_vline(xintercept = point_80$Customer_Percentile, linetype = "dashed", 
             color = "#D32F2F", linewidth = 1) +
  annotate(
    "text",
    x = point_80$Customer_Percentile + 10,
    y = 70,
    label = sprintf("Top %.1f%% of customers\ndrive 80%% of revenue", 
                    point_80$Customer_Percentile),
    color = "#D32F2F",
    fontface = "bold",
    size = 4
  ) +
  scale_x_continuous(breaks = seq(0, 100, 10)) +
  scale_y_continuous(breaks = seq(0, 100, 10)) +
  labs(
    title = "Revenue Pareto Analysis: The 80/20 Rule",
    subtitle = "Cumulative revenue contribution by customer percentile",
    x = "Customer Percentile (%)",
    y = "Cumulative Revenue Contribution (%)",
    caption = "Red lines show the 80/20 inflection point"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "gray40"),
    panel.grid.minor = element_blank()
  )

Pareto insights:

The chart reveals: - What % of customers drive 80% of revenue? 25.9% of customers drive 80% of revenue - How concentrated is revenue risk? - Is it closer to 80/20, 90/10, or 70/30? it’s closer to 70/30


Visualization 6: Segment Revenue Breakdown

Shows absolute revenue contribution with segment composition.

# Calculate total revenue for reference line
total_revenue <- sum(segment_summary$Total_Revenue)
avg_revenue <- total_revenue / nrow(segment_summary)

# Create bar chart with reference line
ggplot(segment_summary, aes(x = reorder(Segment, Total_Revenue), y = Total_Revenue, 
                             fill = Segment)) +
  geom_col() +
  geom_hline(yintercept = avg_revenue, linetype = "dashed", color = "gray30", linewidth = 1) +
  geom_text(
    aes(label = scales::dollar(Total_Revenue, prefix = "£", scale = 1e-3, suffix = "K")),
    hjust = -0.1,
    size = 3.5
  ) +
  annotate(
    "text",
    x = 1,
    y = avg_revenue * 1.1,
    label = "Average segment revenue",
    color = "gray30",
    size = 3
  ) +
  scale_y_continuous(
    labels = scales::dollar_format(prefix = "£", scale = 1e-3, suffix = "K"),
    expand = expansion(mult = c(0, 0.15))
  ) +
  scale_fill_viridis_d(option = "turbo") +
  coord_flip() +
  labs(
    title = "Total Revenue Contribution by Segment",
    subtitle = "Absolute revenue generated by each customer segment",
    x = NULL,
    y = "Total Revenue (£ thousands)",
    caption = "Dashed line shows average segment revenue"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "gray40"),
    legend.position = "none",
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank()
  )

Revenue breakdown insights:

  • Which segment is the revenue leader?
  • How many segments are above/below average?
  • What’s the revenue gap between best and worst segments?

Visualization 7: Customer Lifecycle Journey

Shows potential customer progression through segments.

# Define typical customer journey paths
lifecycle_paths <- tribble(
  ~From, ~To, ~Flow,
  "New Customers", "Promising", 1,
  "New Customers", "Lost", 1,
  "Promising", "Potential Loyalists", 2,
  "Promising", "About to Sleep", 1,
  "Potential Loyalists", "Loyal Customers", 3,
  "Potential Loyalists", "Need Attention", 1,
  "Loyal Customers", "Champions", 2,
  "Loyal Customers", "At Risk", 1,
  "Champions", "Need Attention", 1,
  "Need Attention", "At Risk", 2,
  "At Risk", "Can't Lose Them", 1,
  "At Risk", "Hibernating", 2,
  "About to Sleep", "Hibernating", 2,
  "Hibernating", "Lost", 3
)

# Create alluvial-style visualization (simplified version)
ggplot(lifecycle_paths, aes(x = From, y = Flow, fill = To)) +
  geom_col(width = 0.7) +
  facet_wrap(~From, scales = "free_x", ncol = 4) +
  scale_fill_viridis_d() +
  labs(
    title = "Customer Lifecycle Journey: Segment Transitions",
    subtitle = "Common paths customers take through RFM segments (illustrative)",
    x = "Current Segment",
    y = "Relative Flow",
    fill = "Next Segment",
    caption = "Flow width represents likelihood of transition"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "gray40"),
    axis.text.x = element_blank(),
    panel.grid = element_blank(),
    strip.text = element_text(face = "bold", size = 9)
  )

Lifecycle insights:

This conceptual diagram shows: - Ideal progression: New → Promising → Potential Loyalist → Loyal → Champion - Risk paths: Champion → Need Attention → At Risk → Lost - Intervention points: Where to focus retention efforts


Deep Dive: Key Segments

Now we’ll analyze the most important segments in detail.

Champions: The Best Customers

# Detailed Champions analysis
champions <- customer_rfm %>%
  filter(Segment == "Champions")

cat("CHAMPIONS SEGMENT DEEP DIVE\n")
## CHAMPIONS SEGMENT DEEP DIVE
cat(rep("=", 60), "\n\n", sep = "")
## ============================================================
cat("Size & Value:\n")
## Size & Value:
cat(sprintf("  Total customers: %d (%.1f%% of base)\n", 
            nrow(champions),
            nrow(champions) / nrow(customer_rfm) * 100))
##   Total customers: 848 (19.4% of base)
cat(sprintf("  Total revenue: %s (%.1f%% of total)\n",
            dollar(sum(champions$TotalSpent), prefix = "£"),
            sum(champions$TotalSpent) / sum(customer_rfm$TotalSpent) * 100))
##   Total revenue: £5,625,826 (63.3% of total)
cat(sprintf("  Average customer value: %s\n",
            dollar(mean(champions$TotalSpent), prefix = "£")))
##   Average customer value: £6,634.23
cat("\nBehavioral Characteristics:\n")
## 
## Behavioral Characteristics:
cat(sprintf("  Avg days since last purchase: %.1f days\n", 
            mean(champions$DaysSinceLastPurchase)))
##   Avg days since last purchase: 10.1 days
cat(sprintf("  Avg purchase frequency: %.1f transactions\n", 
            mean(champions$TotalTransactions)))
##   Avg purchase frequency: 12.0 transactions
cat(sprintf("  Avg order value: %s\n",
            dollar(mean(champions$AverageOrderValue), prefix = "£")))
##   Avg order value: £483.07
cat(sprintf("  Avg customer lifetime: %.0f days (%.1f months)\n",
            mean(champions$CustomerLifetimeDays),
            mean(champions$CustomerLifetimeDays) / 30))
##   Avg customer lifetime: 290 days (9.7 months)
cat("\nGeographic Distribution:\n")
## 
## Geographic Distribution:
champions %>%
  count(PrimaryCountry, sort = TRUE) %>%
  head(5) %>%
  mutate(Pct = round(n / nrow(champions) * 100, 1)) %>%
  kable(col.names = c("Country", "Customers", "% of Champions")) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  print()
## <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:left;"> Country </th>
##    <th style="text-align:right;"> Customers </th>
##    <th style="text-align:right;"> % of Champions </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;"> United Kingdom </td>
##    <td style="text-align:right;"> 750 </td>
##    <td style="text-align:right;"> 88.4 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Germany </td>
##    <td style="text-align:right;"> 29 </td>
##    <td style="text-align:right;"> 3.4 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> France </td>
##    <td style="text-align:right;"> 27 </td>
##    <td style="text-align:right;"> 3.2 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Belgium </td>
##    <td style="text-align:right;"> 8 </td>
##    <td style="text-align:right;"> 0.9 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Spain </td>
##    <td style="text-align:right;"> 5 </td>
##    <td style="text-align:right;"> 0.6 </td>
##   </tr>
## </tbody>
## </table>
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

Champions insights: These are your most valuable customers. Protect and nurture them at all costs.


Loyal Customers: The Backbone

# Detailed Loyal Customers analysis
loyal <- customer_rfm %>%
  filter(Segment == "Loyal Customers")

cat("LOYAL CUSTOMERS SEGMENT DEEP DIVE\n")
## LOYAL CUSTOMERS SEGMENT DEEP DIVE
cat(rep("=", 60), "\n\n", sep = "")
## ============================================================
cat("Size & Value:\n")
## Size & Value:
cat(sprintf("  Total customers: %d (%.1f%% of base)\n", 
            nrow(loyal),
            nrow(loyal) / nrow(customer_rfm) * 100))
##   Total customers: 411 (9.4% of base)
cat(sprintf("  Total revenue: %s (%.1f%% of total)\n",
            dollar(sum(loyal$TotalSpent), prefix = "£"),
            sum(loyal$TotalSpent) / sum(customer_rfm$TotalSpent) * 100))
##   Total revenue: £970,745 (10.9% of total)
cat(sprintf("  Average customer value: %s\n",
            dollar(mean(loyal$TotalSpent), prefix = "£")))
##   Average customer value: £2,361.91
cat("\nBehavioral Characteristics:\n")
## 
## Behavioral Characteristics:
cat(sprintf("  Avg days since last purchase: %.1f days\n", 
            mean(loyal$DaysSinceLastPurchase)))
##   Avg days since last purchase: 35.8 days
cat(sprintf("  Avg purchase frequency: %.1f transactions\n", 
            mean(loyal$TotalTransactions)))
##   Avg purchase frequency: 5.4 transactions
cat(sprintf("  Avg order value: %s\n",
            dollar(mean(loyal$AverageOrderValue), prefix = "£")))
##   Avg order value: £427.31
cat("\nUpgrade Potential:\n")
## 
## Upgrade Potential:
loyal_upgrade <- loyal %>%
  mutate(
    Needs_Recency = R_Score < 5,
    Needs_Frequency = F_Score < 5,
    Needs_Monetary = M_Score < 5
  )

cat(sprintf("  Could improve recency: %d customers (%.1f%%)\n",
            sum(loyal_upgrade$Needs_Recency),
            sum(loyal_upgrade$Needs_Recency) / nrow(loyal) * 100))
##   Could improve recency: 368 customers (89.5%)
cat(sprintf("  Could improve frequency: %d customers (%.1f%%)\n",
            sum(loyal_upgrade$Needs_Frequency),
            sum(loyal_upgrade$Needs_Frequency) / nrow(loyal) * 100))
##   Could improve frequency: 296 customers (72.0%)
cat(sprintf("  Could improve spend: %d customers (%.1f%%)\n",
            sum(loyal_upgrade$Needs_Monetary),
            sum(loyal_upgrade$Needs_Monetary) / nrow(loyal) * 100))
##   Could improve spend: 270 customers (65.7%)
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

Loyal customers insights: These customers are close to becoming Champions. Focus on moving them up.


At Risk: The Danger Zone

# Detailed At Risk analysis
at_risk <- customer_rfm %>%
  filter(Segment == "At Risk")

cat("AT RISK SEGMENT DEEP DIVE\n")
## AT RISK SEGMENT DEEP DIVE
cat(rep("=", 60), "\n\n", sep = "")
## ============================================================
cat("Size & Risk Assessment:\n")
## Size & Risk Assessment:
cat(sprintf("  Total customers: %d (%.1f%% of base)\n", 
            nrow(at_risk),
            nrow(at_risk) / nrow(customer_rfm) * 100))
##   Total customers: 428 (9.8% of base)
cat(sprintf("  Revenue at stake: %s (%.1f%% of total)\n",
            dollar(sum(at_risk$TotalSpent), prefix = "£"),
            sum(at_risk$TotalSpent) / sum(customer_rfm$TotalSpent) * 100))
##   Revenue at stake: £746,752 (8.4% of total)
cat(sprintf("  Average customer value: %s\n",
            dollar(mean(at_risk$TotalSpent), prefix = "£")))
##   Average customer value: £1,744.75
cat("\nWhy They're At Risk:\n")
## 
## Why They're At Risk:
cat(sprintf("  Avg days since last purchase: %.1f days (LONG TIME!)\n", 
            mean(at_risk$DaysSinceLastPurchase)))
##   Avg days since last purchase: 140.6 days (LONG TIME!)
cat(sprintf("  BUT avg historical frequency: %.1f transactions\n", 
            mean(at_risk$TotalTransactions)))
##   BUT avg historical frequency: 3.7 transactions
cat(sprintf("  AND avg historical spend: %s\n",
            dollar(mean(at_risk$TotalSpent), prefix = "£")))
##   AND avg historical spend: £1,744.75
cat("\nUrgency Tiers:\n")
## 
## Urgency Tiers:
at_risk_urgency <- at_risk %>%
  mutate(
    Urgency = case_when(
      DaysSinceLastPurchase > 180 ~ "Critical (180+ days)",
      DaysSinceLastPurchase > 120 ~ "High (120-180 days)",
      DaysSinceLastPurchase > 90 ~ "Medium (90-120 days)",
      TRUE ~ "Moderate (<90 days)"
    )
  ) %>%
  count(Urgency) %>%
  mutate(
    Pct = round(n / nrow(at_risk) * 100, 1),
    Est_Revenue = n * mean(at_risk$TotalSpent)
  )

at_risk_urgency %>%
  arrange(desc(n)) %>%
  mutate(Est_Revenue = dollar(Est_Revenue, prefix = "£")) %>%
  kable(col.names = c("Urgency Level", "Customers", "% of At Risk", "Est. Revenue Value")) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  print()
## <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:left;"> Urgency Level </th>
##    <th style="text-align:right;"> Customers </th>
##    <th style="text-align:right;"> % of At Risk </th>
##    <th style="text-align:left;"> Est. Revenue Value </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;"> Moderate (&lt;90 days) </td>
##    <td style="text-align:right;"> 126 </td>
##    <td style="text-align:right;"> 29.4 </td>
##    <td style="text-align:left;"> £219,838 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Critical (180+ days) </td>
##    <td style="text-align:right;"> 110 </td>
##    <td style="text-align:right;"> 25.7 </td>
##    <td style="text-align:left;"> £191,922 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> High (120-180 days) </td>
##    <td style="text-align:right;"> 98 </td>
##    <td style="text-align:right;"> 22.9 </td>
##    <td style="text-align:left;"> £170,985 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Medium (90-120 days) </td>
##    <td style="text-align:right;"> 94 </td>
##    <td style="text-align:right;"> 22.0 </td>
##    <td style="text-align:left;"> £164,006 </td>
##   </tr>
## </tbody>
## </table>
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

At Risk insights: These high-value customers are slipping away. Immediate intervention needed.


Lost vs New: The Extremes

# Compare Lost and New customers
comparison_extremes <- customer_rfm %>%
  filter(Segment %in% c("Lost", "New Customers")) %>%
  group_by(Segment) %>%
  summarize(
    Customers = n(),
    Pct_of_Base = round(n() / nrow(customer_rfm) * 100, 2),
    Avg_Recency = round(mean(DaysSinceLastPurchase), 1),
    Avg_Frequency = round(mean(TotalTransactions), 1),
    Avg_Monetary = round(mean(TotalSpent), 2),
    Total_Revenue = sum(TotalSpent)
  )

cat("LOST VS NEW CUSTOMERS COMPARISON\n")
## LOST VS NEW CUSTOMERS COMPARISON
cat(rep("=", 60), "\n\n", sep = "")
## ============================================================
comparison_extremes %>%
  mutate(
    Avg_Monetary = dollar(Avg_Monetary, prefix = "£"),
    Total_Revenue = dollar(Total_Revenue, prefix = "£")
  ) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  print()
## <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:left;"> Segment </th>
##    <th style="text-align:right;"> Customers </th>
##    <th style="text-align:right;"> Pct_of_Base </th>
##    <th style="text-align:right;"> Avg_Recency </th>
##    <th style="text-align:right;"> Avg_Frequency </th>
##    <th style="text-align:left;"> Avg_Monetary </th>
##    <th style="text-align:left;"> Total_Revenue </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;"> Lost </td>
##    <td style="text-align:right;"> 500 </td>
##    <td style="text-align:right;"> 11.44 </td>
##    <td style="text-align:right;"> 276.6 </td>
##    <td style="text-align:right;"> 1.0 </td>
##    <td style="text-align:left;"> £187.13 </td>
##    <td style="text-align:left;"> £93,564.10 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> New Customers </td>
##    <td style="text-align:right;"> 266 </td>
##    <td style="text-align:right;"> 6.09 </td>
##    <td style="text-align:right;"> 26.7 </td>
##    <td style="text-align:right;"> 1.1 </td>
##    <td style="text-align:left;"> £217.65 </td>
##    <td style="text-align:left;"> £57,895.49 </td>
##   </tr>
## </tbody>
## </table>
cat("\nStrategic Implications:\n")
## 
## Strategic Implications:
cat(sprintf("  Lost customer reactivation potential: %s\n",
            dollar(sum((customer_rfm %>% filter(Segment == "Lost"))$TotalSpent), prefix = "£")))
##   Lost customer reactivation potential: £93,564.10
cat(sprintf("  New customer development potential: High (early stage)\n"))
##   New customer development potential: High (early stage)
cat(sprintf("  Resource allocation: Focus on New (prevent becoming Lost)\n"))
##   Resource allocation: Focus on New (prevent becoming Lost)
cat("\n", rep("=", 60), "\n\n", sep = "")
## 
## ============================================================

Extremes insights: Lost customers show what happens without engagement. New customers are opportunities.


Interactive Elements

Create interactive visualizations for deeper exploration.

Interactive Segment Table

# Create interactive, searchable table of all customers with RFM scores
customer_rfm %>%
  select(CustomerID, Segment, RFM_Score, R_Score, F_Score, M_Score,
         DaysSinceLastPurchase, TotalTransactions, TotalSpent, 
         AverageOrderValue, PrimaryCountry) %>%
  mutate(
    TotalSpent = round(TotalSpent, 2),
    AverageOrderValue = round(AverageOrderValue, 2)
  ) %>%
  arrange(desc(TotalSpent)) %>%
  datatable(
    caption = "Interactive Customer RFM Segmentation Table",
    options = list(
      pageLength = 25,
      scrollX = TRUE,
      dom = 'Bfrtip',
      buttons = c('copy', 'csv', 'excel')
    ),
    filter = "top",
    rownames = FALSE,
    colnames = c(
      "Customer ID", "Segment", "RFM Score", "R", "F", "M",
      "Days Since Purchase", "Total Purchases", "Total Spent (£)", 
      "Avg Order Value (£)", "Country"
    )
  ) %>%
  formatCurrency(c("TotalSpent", "AverageOrderValue"), "£") %>%
  formatStyle(
    "Segment",
    backgroundColor = styleEqual(
      c("Champions", "Loyal Customers", "At Risk", "Lost"),
      c("#C8E6C9", "#BBDEFB", "#FFCDD2", "#E0E0E0")
    )
  )

Interactive table: Filter, sort, and search all customers by segment and metrics.

Usage: - Search for specific customers - Filter by segment - Export to Excel for marketing teams


Save Visualizations

Export all visualizations for reports and presentations.

# Create visualizations folder if it doesn't exist
viz_dir <- here("visualizations", "static")
if(!dir.exists(viz_dir)) {
  dir.create(viz_dir, recursive = TRUE)
}

cat("Saving visualizations to:", viz_dir, "\n\n")
## Saving visualizations to: /Users/joaovictorgomes/Documents/ecommerce-customer-analytics/visualizations/static
# Note: In actual implementation, you would re-create each plot and save
# For now, we'll document what should be saved

cat("Visualizations to save:\n")
## Visualizations to save:
cat("  1. customer_distribution_by_segment.png\n")
##   1. customer_distribution_by_segment.png
cat("  2. revenue_contribution_comparison.png\n")
##   2. revenue_contribution_comparison.png
cat("  3. rfm_score_heatmap.png\n")
##   3. rfm_score_heatmap.png
cat("  4. segment_radar_profiles.png\n")
##   4. segment_radar_profiles.png
cat("  5. revenue_pareto_chart.png\n")
##   5. revenue_pareto_chart.png
cat("  6. segment_revenue_breakdown.png\n")
##   6. segment_revenue_breakdown.png
cat("  7. customer_lifecycle_journey.png\n\n")
##   7. customer_lifecycle_journey.png
cat("All visualizations created and documented!\n")
## All visualizations created and documented!

Visualization exports: High-resolution images saved for inclusion in reports and presentations.


Day 2 Summary

Visualization Day Complete! ✓


DAY 3: BUSINESS INSIGHTS & RECOMMENDATIONS


Strategic Business Recommendations

Based on our RFM analysis, here are actionable recommendations for each customer segment.

Segment-Specific Action Plans

1. Champions: Protect and Amplify

# Calculate Champions metrics for recommendations
champions_metrics <- customer_rfm %>%
  filter(Segment == "Champions") %>%
  summarize(
    Total_Customers = n(),
    Pct_of_Base = round(n() / nrow(customer_rfm) * 100, 2),
    Total_Revenue = sum(TotalSpent),
    Pct_of_Revenue = round(sum(TotalSpent) / sum(customer_rfm$TotalSpent) * 100, 2),
    Avg_Customer_Value = mean(TotalSpent),
    Avg_Order_Value = mean(AverageOrderValue)
  )

cat("CHAMPIONS SEGMENT STRATEGY\n")
## CHAMPIONS SEGMENT STRATEGY
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("Segment Overview:\n")
## Segment Overview:
cat(sprintf("  • %d customers (%.1f%% of base)\n", 
            champions_metrics$Total_Customers,
            champions_metrics$Pct_of_Base))
##   • 848 customers (19.4% of base)
cat(sprintf("  • %s revenue (%.1f%% of total)\n",
            dollar(champions_metrics$Total_Revenue, prefix = "£"),
            champions_metrics$Pct_of_Revenue))
##   • £5,625,826 revenue (63.3% of total)
cat(sprintf("  • Average customer value: %s\n",
            dollar(champions_metrics$Avg_Customer_Value, prefix = "£")))
##   • Average customer value: £6,634.23
cat("\n📋 RECOMMENDED ACTIONS:\n\n")
## 
## 📋 RECOMMENDED ACTIONS:
cat("1. VIP Loyalty Program\n")
## 1. VIP Loyalty Program
cat("   • Exclusive benefits and early product access\n")
##    • Exclusive benefits and early product access
cat("   • Personal account manager for top spenders\n")
##    • Personal account manager for top spenders
cat("   • Invitation-only events and product launches\n")
##    • Invitation-only events and product launches
cat(sprintf("   • Expected retention: 95%%+\n\n"))
##    • Expected retention: 95%+
cat("2. Referral & Advocacy Program\n")
## 2. Referral & Advocacy Program
cat("   • Incentivized referrals (discount + reward)\n")
##    • Incentivized referrals (discount + reward)
cat("   • Request product reviews and testimonials\n")
##    • Request product reviews and testimonials
cat("   • Social media ambassador opportunities\n")
##    • Social media ambassador opportunities
cat(sprintf("   • Expected new customer acquisition: 20-30%% of Champions base\n\n"))
##    • Expected new customer acquisition: 20-30% of Champions base
cat("3. Premium Upsell Strategy\n")
## 3. Premium Upsell Strategy
cat(sprintf("   • Current AOV: %s\n", dollar(champions_metrics$Avg_Order_Value, prefix = "£")))
##    • Current AOV: £483.07
cat("   • Target: +20% through premium product recommendations\n")
##    • Target: +20% through premium product recommendations
cat("   • Personalized product bundles\n")
##    • Personalized product bundles
cat(sprintf("   • Revenue uplift potential: %s\n\n",
            dollar(champions_metrics$Total_Revenue * 0.20, prefix = "£")))
##    • Revenue uplift potential: £1,125,165
cat("4. Communication Strategy\n")
## 4. Communication Strategy
cat("   • Monthly personalized newsletters\n")
##    • Monthly personalized newsletters
cat("   • Priority customer service (24hr response)\n")
##    • Priority customer service (24hr response)
cat("   • Birthday/anniversary special offers\n")
##    • Birthday/anniversary special offers
cat("   • Channel: Email + SMS for urgent offers\n\n")
##    • Channel: Email + SMS for urgent offers
cat("💰 BUDGET ALLOCATION: 30% of marketing budget\n")
## 💰 BUDGET ALLOCATION: 30% of marketing budget
cat("   • Highest ROI segment\n")
##    • Highest ROI segment
cat("   • Focus on retention over acquisition\n\n")
##    • Focus on retention over acquisition
cat("📊 SUCCESS METRICS:\n")
## 📊 SUCCESS METRICS:
cat("   • Retention rate: Maintain >95%\n")
##    • Retention rate: Maintain >95%
cat("   • AOV increase: +15-20%\n")
##    • AOV increase: +15-20%
cat("   • Referral rate: 25-30% of Champions\n")
##    • Referral rate: 25-30% of Champions
cat("   • Repeat purchase rate: >80% quarterly\n\n")
##    • Repeat purchase rate: >80% quarterly
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================

2. Loyal Customers: Nurture to Champions

# Calculate Loyal Customers metrics
loyal_metrics <- customer_rfm %>%
  filter(Segment == "Loyal Customers") %>%
  summarize(
    Total_Customers = n(),
    Pct_of_Base = round(n() / nrow(customer_rfm) * 100, 2),
    Total_Revenue = sum(TotalSpent),
    Pct_of_Revenue = round(sum(TotalSpent) / sum(customer_rfm$TotalSpent) * 100, 2),
    Avg_Customer_Value = mean(TotalSpent),
    Upgrade_Potential = sum(TotalSpent) * 0.30  # 30% uplift potential
  )

cat("LOYAL CUSTOMERS SEGMENT STRATEGY\n")
## LOYAL CUSTOMERS SEGMENT STRATEGY
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("Segment Overview:\n")
## Segment Overview:
cat(sprintf("  • %d customers (%.1f%% of base)\n", 
            loyal_metrics$Total_Customers,
            loyal_metrics$Pct_of_Base))
##   • 411 customers (9.4% of base)
cat(sprintf("  • %s revenue (%.1f%% of total)\n",
            dollar(loyal_metrics$Total_Revenue, prefix = "£"),
            loyal_metrics$Pct_of_Revenue))
##   • £970,745 revenue (10.9% of total)
cat(sprintf("  • Upgrade to Champions potential: %s\n",
            dollar(loyal_metrics$Upgrade_Potential, prefix = "£")))
##   • Upgrade to Champions potential: £291,223
cat("\n📋 RECOMMENDED ACTIONS:\n\n")
## 
## 📋 RECOMMENDED ACTIONS:
cat("1. Loyalty Points Program\n")
## 1. Loyalty Points Program
cat("   • Earn points on every purchase\n")
##    • Earn points on every purchase
cat("   • Tiered rewards (push toward Champion tier)\n")
##    • Tiered rewards (push toward Champion tier)
cat("   • Bonus points for referrals and reviews\n")
##    • Bonus points for referrals and reviews
cat("   • Expected engagement: 60-70% participation\n\n")
##    • Expected engagement: 60-70% participation
cat("2. Cross-Sell & Bundle Recommendations\n")
## 2. Cross-Sell & Bundle Recommendations
cat("   • AI-powered product recommendations\n")
##    • AI-powered product recommendations
cat("   • \"Frequently bought together\" bundles\n")
##    • "Frequently bought together" bundles
cat("   • Category expansion incentives\n")
##    • Category expansion incentives
cat("   • Expected basket size increase: 15-25%\n\n")
##    • Expected basket size increase: 15-25%
cat("3. Engagement Campaigns\n")
## 3. Engagement Campaigns
cat("   • Bi-weekly targeted email campaigns\n")
##    • Bi-weekly targeted email campaigns
cat("   • Personalized offers based on purchase history\n")
##    • Personalized offers based on purchase history
cat("   • Early access to sales (before general public)\n")
##    • Early access to sales (before general public)
cat("   • Educational content (product guides, tips)\n\n")
##    • Educational content (product guides, tips)
cat("4. Feedback & Co-Creation\n")
## 4. Feedback & Co-Creation
cat("   • Product development surveys\n")
##    • Product development surveys
cat("   • Beta testing opportunities\n")
##    • Beta testing opportunities
cat("   • Make them feel valued and heard\n\n")
##    • Make them feel valued and heard
cat("💰 BUDGET ALLOCATION: 25% of marketing budget\n")
## 💰 BUDGET ALLOCATION: 25% of marketing budget
cat("   • High volume, good ROI\n")
##    • High volume, good ROI
cat("   • Focus on upgrade path to Champions\n\n")
##    • Focus on upgrade path to Champions
cat("📊 SUCCESS METRICS:\n")
## 📊 SUCCESS METRICS:
cat("   • Champions upgrade rate: 15-20% annually\n")
##    • Champions upgrade rate: 15-20% annually
cat("   • Purchase frequency increase: +20%\n")
##    • Purchase frequency increase: +20%
cat("   • Average order value increase: +15%\n")
##    • Average order value increase: +15%
cat("   • Churn rate: <10% annually\n\n")
##    • Churn rate: <10% annually
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================

3. At Risk: Urgent Win-Back

# Calculate At Risk metrics
at_risk_metrics <- customer_rfm %>%
  filter(Segment == "At Risk") %>%
  summarize(
    Total_Customers = n(),
    Pct_of_Base = round(n() / nrow(customer_rfm) * 100, 2),
    Total_Revenue_at_Stake = sum(TotalSpent),
    Pct_of_Revenue = round(sum(TotalSpent) / sum(customer_rfm$TotalSpent) * 100, 2),
    Avg_Days_Inactive = mean(DaysSinceLastPurchase),
    Win_Back_Potential = sum(TotalSpent) * 0.40  # 40% recoverable
  )

cat("AT RISK SEGMENT STRATEGY (URGENT)\n")
## AT RISK SEGMENT STRATEGY (URGENT)
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("⚠️  CRITICAL ALERT: High-value customers are churning!\n\n")
## ⚠️  CRITICAL ALERT: High-value customers are churning!
cat("Segment Overview:\n")
## Segment Overview:
cat(sprintf("  • %d customers (%.1f%% of base)\n", 
            at_risk_metrics$Total_Customers,
            at_risk_metrics$Pct_of_Base))
##   • 428 customers (9.8% of base)
cat(sprintf("  • %s revenue at stake (%.1f%% of total)\n",
            dollar(at_risk_metrics$Total_Revenue_at_Stake, prefix = "£"),
            at_risk_metrics$Pct_of_Revenue))
##   • £746,752 revenue at stake (8.4% of total)
cat(sprintf("  • Average inactivity: %.0f days\n", at_risk_metrics$Avg_Days_Inactive))
##   • Average inactivity: 141 days
cat(sprintf("  • Recoverable revenue (40%% win-back): %s\n",
            dollar(at_risk_metrics$Win_Back_Potential, prefix = "£")))
##   • Recoverable revenue (40% win-back): £298,701
cat("\n📋 RECOMMENDED ACTIONS (IMMEDIATE):\n\n")
## 
## 📋 RECOMMENDED ACTIONS (IMMEDIATE):
cat("1. Aggressive Win-Back Campaign\n")
## 1. Aggressive Win-Back Campaign
cat("   • Personalized \"We miss you\" emails\n")
##    • Personalized "We miss you" emails
cat("   • Special comeback discount (20-30% off)\n")
##    • Special comeback discount (20-30% off)
cat("   • Free shipping on next order\n")
##    • Free shipping on next order
cat("   • Phone outreach for highest-value customers\n")
##    • Phone outreach for highest-value customers
cat("   • Timeline: Launch within 7 days\n\n")
##    • Timeline: Launch within 7 days
cat("2. Understand Why They Left\n")
## 2. Understand Why They Left
cat("   • Exit survey with incentive for completion\n")
##    • Exit survey with incentive for completion
cat("   • Identify common pain points\n")
##    • Identify common pain points
cat("   • Address product/service issues\n")
##    • Address product/service issues
cat("   • Competitive analysis (did they switch?)\n\n")
##    • Competitive analysis (did they switch?)
cat("3. Limited-Time Exclusive Offers\n")
## 3. Limited-Time Exclusive Offers
cat("   • VIP-only flash sales\n")
##    • VIP-only flash sales
cat("   • Early access to new products\n")
##    • Early access to new products
cat("   • Personalized product bundles\n")
##    • Personalized product bundles
cat("   • Urgency messaging (\"Offer expires in 48 hours\")\n\n")
##    • Urgency messaging ("Offer expires in 48 hours")
cat("4. Multi-Channel Approach\n")
## 4. Multi-Channel Approach
cat("   • Email (primary)\n")
##    • Email (primary)
cat("   • SMS for high-value customers\n")
##    • SMS for high-value customers
cat("   • Retargeting ads (social media)\n")
##    • Retargeting ads (social media)
cat("   • Direct mail for top spenders\n\n")
##    • Direct mail for top spenders
cat("💰 BUDGET ALLOCATION: 20% of marketing budget\n")
## 💰 BUDGET ALLOCATION: 20% of marketing budget
cat("   • High stakes - worth aggressive investment\n")
##    • High stakes - worth aggressive investment
cat("   • ROI on win-back: 3-5x campaign cost\n\n")
##    • ROI on win-back: 3-5x campaign cost
cat("📊 SUCCESS METRICS:\n")
## 📊 SUCCESS METRICS:
cat("   • Win-back rate target: 30-40%\n")
##    • Win-back rate target: 30-40%
cat(sprintf("   • Revenue recovery target: %s\n",
            dollar(at_risk_metrics$Win_Back_Potential, prefix = "£")))
##    • Revenue recovery target: £298,701
cat("   • Response rate: >15%\n")
##    • Response rate: >15%
cat("   • Time to re-purchase: <30 days from campaign\n\n")
##    • Time to re-purchase: <30 days from campaign
cat("⏰ URGENCY TIER CAMPAIGNS:\n")
## ⏰ URGENCY TIER CAMPAIGNS:
at_risk_urgency_plan <- customer_rfm %>%
  filter(Segment == "At Risk") %>%
  mutate(
    Urgency = case_when(
      DaysSinceLastPurchase > 180 ~ "Critical",
      DaysSinceLastPurchase > 120 ~ "High",
      DaysSinceLastPurchase > 90 ~ "Medium",
      TRUE ~ "Moderate"
    )
  ) %>%
  count(Urgency) %>%
  arrange(desc(n))

for(i in 1:nrow(at_risk_urgency_plan)) {
  cat(sprintf("   • %s: %d customers - ", 
              at_risk_urgency_plan$Urgency[i],
              at_risk_urgency_plan$n[i]))
  if(at_risk_urgency_plan$Urgency[i] == "Critical") {
    cat("Immediate phone calls + maximum discount\n")
  } else if(at_risk_urgency_plan$Urgency[i] == "High") {
    cat("Personalized email + SMS + strong offer\n")
  } else if(at_risk_urgency_plan$Urgency[i] == "Medium") {
    cat("Targeted email campaign + moderate discount\n")
  } else {
    cat("Standard win-back email series\n")
  }
}
##    • Moderate: 126 customers - Standard win-back email series
##    • Critical: 110 customers - Immediate phone calls + maximum discount
##    • High: 98 customers - Personalized email + SMS + strong offer
##    • Medium: 94 customers - Targeted email campaign + moderate discount
cat("\n", rep("=", 70), "\n\n", sep = "")
## 
## ======================================================================

4. Potential Loyalists: Accelerate Growth

# Calculate Potential Loyalists metrics
potential_metrics <- customer_rfm %>%
  filter(Segment == "Potential Loyalists") %>%
  summarize(
    Total_Customers = n(),
    Pct_of_Base = round(n() / nrow(customer_rfm) * 100, 2),
    Total_Revenue = sum(TotalSpent),
    Growth_Potential = sum(TotalSpent) * 0.50  # 50% growth potential
  )

cat("POTENTIAL LOYALISTS SEGMENT STRATEGY\n")
## POTENTIAL LOYALISTS SEGMENT STRATEGY
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("Segment Overview:\n")
## Segment Overview:
cat(sprintf("  • %d customers (%.1f%% of base) - LARGEST OPPORTUNITY\n", 
            potential_metrics$Total_Customers,
            potential_metrics$Pct_of_Base))
##   • 415 customers (9.5% of base) - LARGEST OPPORTUNITY
cat(sprintf("  • Current revenue: %s\n",
            dollar(potential_metrics$Total_Revenue, prefix = "£")))
##   • Current revenue: £223,294
cat(sprintf("  • Growth potential: %s (50%% uplift)\n",
            dollar(potential_metrics$Growth_Potential, prefix = "£")))
##   • Growth potential: £111,647 (50% uplift)
cat("\n📋 RECOMMENDED ACTIONS:\n\n")
## 
## 📋 RECOMMENDED ACTIONS:
cat("1. Membership/Subscription Program\n")
## 1. Membership/Subscription Program
cat("   • Offer subscription with benefits\n")
##    • Offer subscription with benefits
cat("   • Monthly product boxes or auto-replenishment\n")
##    • Monthly product boxes or auto-replenishment
cat("   • 10-15% discount for subscribers\n")
##    • 10-15% discount for subscribers
cat("   • Expected conversion: 20-25%\n\n")
##    • Expected conversion: 20-25%
cat("2. Personalized Nurture Campaigns\n")
## 2. Personalized Nurture Campaigns
cat("   • Welcome series for recent joiners\n")
##    • Welcome series for recent joiners
cat("   • Educational content (product usage tips)\n")
##    • Educational content (product usage tips)
cat("   • Social proof (reviews, testimonials)\n")
##    • Social proof (reviews, testimonials)
cat("   • Frequency: Weekly touchpoints\n\n")
##    • Frequency: Weekly touchpoints
cat("3. Incentivize Repeat Purchases\n")
## 3. Incentivize Repeat Purchases
cat("   • Second purchase discount (\"Complete your collection\")\n")
##    • Second purchase discount ("Complete your collection")
cat("   • Limited-time offers to create urgency\n")
##    • Limited-time offers to create urgency
cat("   • Free shipping threshold to increase basket size\n\n")
##    • Free shipping threshold to increase basket size
cat("4. Community Building\n")
## 4. Community Building
cat("   • Customer community/forum\n")
##    • Customer community/forum
cat("   • User-generated content campaigns\n")
##    • User-generated content campaigns
cat("   • Social media engagement\n\n")
##    • Social media engagement
cat("💰 BUDGET ALLOCATION: 20% of marketing budget\n")
## 💰 BUDGET ALLOCATION: 20% of marketing budget
cat("   • Largest segment with highest growth potential\n")
##    • Largest segment with highest growth potential
cat("   • Focus on conversion to Loyal/Champions\n\n")
##    • Focus on conversion to Loyal/Champions
cat("📊 SUCCESS METRICS:\n")
## 📊 SUCCESS METRICS:
cat("   • Upgrade to Loyal: 25-30% annually\n")
##    • Upgrade to Loyal: 25-30% annually
cat("   • Purchase frequency increase: +40%\n")
##    • Purchase frequency increase: +40%
cat("   • Revenue per customer increase: +50%\n")
##    • Revenue per customer increase: +50%
cat("   • Email engagement rate: >25%\n\n")
##    • Email engagement rate: >25%
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================

5. New Customers: Set Up for Success

# Calculate New Customers metrics
new_metrics <- customer_rfm %>%
  filter(Segment == "New Customers") %>%
  summarize(
    Total_Customers = n(),
    Pct_of_Base = round(n() / nrow(customer_rfm) * 100, 2),
    Avg_First_Purchase = mean(TotalSpent)
  )

cat("NEW CUSTOMERS SEGMENT STRATEGY\n")
## NEW CUSTOMERS SEGMENT STRATEGY
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("Segment Overview:\n")
## Segment Overview:
cat(sprintf("  • %d customers (%.1f%% of base)\n", 
            new_metrics$Total_Customers,
            new_metrics$Pct_of_Base))
##   • 266 customers (6.1% of base)
cat(sprintf("  • Average first purchase: %s\n",
            dollar(new_metrics$Avg_First_Purchase, prefix = "£")))
##   • Average first purchase: £217.65
cat("\n📋 RECOMMENDED ACTIONS:\n\n")
## 
## 📋 RECOMMENDED ACTIONS:
cat("1. Stellar Onboarding Experience\n")
## 1. Stellar Onboarding Experience
cat("   • Welcome email series (Days 0, 3, 7, 14, 30)\n")
##    • Welcome email series (Days 0, 3, 7, 14, 30)
cat("   • Product education and tips\n")
##    • Product education and tips
cat("   • Brand story and values communication\n")
##    • Brand story and values communication
cat("   • Set expectations for future communications\n\n")
##    • Set expectations for future communications
cat("2. Second Purchase Incentive\n")
## 2. Second Purchase Incentive
cat("   • 15% off second purchase (critical conversion point)\n")
##    • 15% off second purchase (critical conversion point)
cat("   • Time-limited offer (30 days)\n")
##    • Time-limited offer (30 days)
cat("   • Complementary product recommendations\n")
##    • Complementary product recommendations
cat("   • Expected conversion: 40-50%\n\n")
##    • Expected conversion: 40-50%
cat("3. Build Relationship Early\n")
## 3. Build Relationship Early
cat("   • Request feedback on first purchase\n")
##    • Request feedback on first purchase
cat("   • Invite to follow social media\n")
##    • Invite to follow social media
cat("   • Optional loyalty program enrollment\n")
##    • Optional loyalty program enrollment
cat("   • Set communication preferences\n\n")
##    • Set communication preferences
cat("4. Prevent Early Churn\n")
## 4. Prevent Early Churn
cat("   • Monitor for signs of dissatisfaction\n")
##    • Monitor for signs of dissatisfaction
cat("   • Proactive customer service outreach\n")
##    • Proactive customer service outreach
cat("   • Address any issues immediately\n\n")
##    • Address any issues immediately
cat("💰 BUDGET ALLOCATION: 10% of marketing budget\n")
## 💰 BUDGET ALLOCATION: 10% of marketing budget
cat("   • Moderate investment with long-term payoff\n")
##    • Moderate investment with long-term payoff
cat("   • Prevention cheaper than re-acquisition\n\n")
##    • Prevention cheaper than re-acquisition
cat("📊 SUCCESS METRICS:\n")
## 📊 SUCCESS METRICS:
cat("   • Second purchase rate: >50% within 60 days\n")
##    • Second purchase rate: >50% within 60 days
cat("   • Progression to Promising/Loyal: >35% within 6 months\n")
##    • Progression to Promising/Loyal: >35% within 6 months
cat("   • Email open rates: >35%\n")
##    • Email open rates: >35%
cat("   • Customer satisfaction score: >4.5/5\n\n")
##    • Customer satisfaction score: >4.5/5
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================

6. Hibernating & Lost: Minimal Investment

# Calculate Hibernating and Lost metrics
hibernating_lost_metrics <- customer_rfm %>%
  filter(Segment %in% c("Hibernating", "Lost")) %>%
  group_by(Segment) %>%
  summarize(
    Total_Customers = n(),
    Total_Revenue = sum(TotalSpent)
  ) %>%
  summarize(
    Total_Customers = sum(Total_Customers),
    Total_Revenue = sum(Total_Revenue)
  )

cat("HIBERNATING & LOST SEGMENTS STRATEGY\n")
## HIBERNATING & LOST SEGMENTS STRATEGY
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("Segment Overview:\n")
## Segment Overview:
cat(sprintf("  • %d customers combined\n", hibernating_lost_metrics$Total_Customers))
##   • 1203 customers combined
cat(sprintf("  • Historical revenue: %s (sunk cost)\n",
            dollar(hibernating_lost_metrics$Total_Revenue, prefix = "£")))
##   • Historical revenue: £374,187 (sunk cost)
cat("\n📋 RECOMMENDED ACTIONS:\n\n")
## 
## 📋 RECOMMENDED ACTIONS:
cat("1. Low-Cost Reactivation Attempts\n")
## 1. Low-Cost Reactivation Attempts
cat("   • Quarterly automated email campaigns\n")
##    • Quarterly automated email campaigns
cat("   • Deep discount offers (40-50% off)\n")
##    • Deep discount offers (40-50% off)
cat("   • \"Last chance\" messaging\n")
##    • "Last chance" messaging
cat("   • Expected win-back: <10%\n\n")
##    • Expected win-back: <10%
cat("2. List Hygiene\n")
## 2. List Hygiene
cat("   • Sunset policy: Remove non-responders after 12 months\n")
##    • Sunset policy: Remove non-responders after 12 months
cat("   • Maintain email deliverability\n")
##    • Maintain email deliverability
cat("   • GDPR compliance (right to be forgotten)\n\n")
##    • GDPR compliance (right to be forgotten)
cat("3. Learn From Losses\n")
## 3. Learn From Losses
cat("   • Analyze reasons for churn\n")
##    • Analyze reasons for churn
cat("   • Identify patterns or common issues\n")
##    • Identify patterns or common issues
cat("   • Prevent similar losses in active segments\n\n")
##    • Prevent similar losses in active segments
cat("💰 BUDGET ALLOCATION: 5% of marketing budget\n")
## 💰 BUDGET ALLOCATION: 5% of marketing budget
cat("   • Minimal investment - low probability of success\n")
##    • Minimal investment - low probability of success
cat("   • Resources better spent on active segments\n\n")
##    • Resources better spent on active segments
cat("📊 SUCCESS METRICS:\n")
## 📊 SUCCESS METRICS:
cat("   • If win-back rate >10%: Increase investment\n")
##    • If win-back rate >10%: Increase investment
cat("   • If win-back rate <5%: Stop investing\n")
##    • If win-back rate <5%: Stop investing
cat("   • Focus resources elsewhere\n\n")
##    • Focus resources elsewhere
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================

Marketing Budget Allocation Summary

# Create budget allocation summary
budget_allocation <- tribble(
  ~Segment, ~Budget_Pct, ~Rationale, ~Expected_ROI,
  "Champions", 30, "Highest value, protect & grow", "5-7x",
  "Loyal Customers", 25, "High volume, upgrade potential", "4-6x",
  "At Risk", 20, "High stakes recovery", "3-5x",
  "Potential Loyalists", 20, "Largest growth opportunity", "3-4x",
  "New Customers", 10, "Future pipeline", "2-3x",
  "Promising", 8, "Moderate potential", "2-3x",
  "Need Attention", 7, "Prevent churn", "2-3x",
  "Hibernating/Lost", 5, "Low probability recovery", "1-2x",
  "Other Segments", 5, "Opportunistic", "1-2x"
) %>%
  mutate(Budget_Pct = Budget_Pct / sum(Budget_Pct) * 100)  # Normalize to 100%

cat("RECOMMENDED MARKETING BUDGET ALLOCATION\n")
## RECOMMENDED MARKETING BUDGET ALLOCATION
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
budget_allocation %>%
  arrange(desc(Budget_Pct)) %>%
  kable(
    col.names = c("Segment", "Budget %", "Rationale", "Expected ROI"),
    caption = "Strategic Budget Allocation by Customer Segment"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Strategic Budget Allocation by Customer Segment
Segment Budget % Rationale Expected ROI
Champions 23.076923 Highest value, protect & grow 5-7x
Loyal Customers 19.230769 High volume, upgrade potential 4-6x
At Risk 15.384615 High stakes recovery 3-5x
Potential Loyalists 15.384615 Largest growth opportunity 3-4x
New Customers 7.692308 Future pipeline 2-3x
Promising 6.153846 Moderate potential 2-3x
Need Attention 5.384615 Prevent churn 2-3x
Hibernating/Lost 3.846154 Low probability recovery 1-2x
Other Segments 3.846154 Opportunistic 1-2x
cat("\n\n💡 KEY PRINCIPLES:\n")
## 
## 
## 💡 KEY PRINCIPLES:
cat("   • Invest most in high-value, high-probability segments\n")
##    • Invest most in high-value, high-probability segments
cat("   • Balance retention (Champions, Loyal) with growth (Potential)\n")
##    • Balance retention (Champions, Loyal) with growth (Potential)
cat("   • Urgent intervention for At Risk (prevent revenue loss)\n")
##    • Urgent intervention for At Risk (prevent revenue loss)
cat("   • Minimal spend on low-probability recoveries\n")
##    • Minimal spend on low-probability recoveries
cat("   • Measure and adjust based on actual ROI\n\n")
##    • Measure and adjust based on actual ROI

Expected Business Impact

Let’s quantify the potential impact of implementing these recommendations.

# Calculate expected impact by segment
impact_analysis <- customer_rfm %>%
  group_by(Segment) %>%
  summarize(
    Current_Customers = n(),
    Current_Revenue = sum(TotalSpent)
  ) %>%
  mutate(
    # Expected uplift by segment (conservative estimates)
    Revenue_Uplift_Pct = case_when(
      Segment == "Champions" ~ 20,           # Upsell success
      Segment == "Loyal Customers" ~ 25,     # Upgrade to Champions
      Segment == "Potential Loyalists" ~ 40, # Nurture to Loyal
      Segment == "Promising" ~ 30,           # Engagement increase
      Segment == "New Customers" ~ 50,       # Second purchase
      Segment == "Need Attention" ~ 20,      # Re-engagement
      Segment == "At Risk" ~ 40,             # Win-back recovery
      Segment == "About to Sleep" ~ 15,      # Prevent churn
      Segment == "Can't Lose Them" ~ 35,     # Aggressive win-back
      Segment == "Hibernating" ~ 5,          # Minimal recovery
      Segment == "Lost" ~ 3,                 # Very low recovery
      TRUE ~ 10
    ),
    Expected_New_Revenue = Current_Revenue * (Revenue_Uplift_Pct / 100),
    Total_Expected_Revenue = Current_Revenue + Expected_New_Revenue
  ) %>%
  arrange(desc(Expected_New_Revenue))

cat("PROJECTED REVENUE IMPACT (12-MONTH HORIZON)\n")
## PROJECTED REVENUE IMPACT (12-MONTH HORIZON)
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("Current State:\n")
## Current State:
cat(sprintf("  • Total current revenue: %s\n",
            dollar(sum(impact_analysis$Current_Revenue), prefix = "£")))
##   • Total current revenue: £8,887,209
cat("\nProjected Impact:\n")
## 
## Projected Impact:
cat(sprintf("  • Total new revenue from initiatives: %s\n",
            dollar(sum(impact_analysis$Expected_New_Revenue), prefix = "£")))
##   • Total new revenue from initiatives: £2,018,999
cat(sprintf("  • Total projected revenue: %s\n",
            dollar(sum(impact_analysis$Total_Expected_Revenue), prefix = "£")))
##   • Total projected revenue: £10,906,208
cat(sprintf("  • Overall revenue increase: %.1f%%\n\n",
            (sum(impact_analysis$Expected_New_Revenue) / sum(impact_analysis$Current_Revenue)) * 100))
##   • Overall revenue increase: 22.7%
# Show top contributing segments
cat("Top 5 Revenue Growth Contributors:\n\n")
## Top 5 Revenue Growth Contributors:
impact_analysis %>%
  select(Segment, Current_Revenue, Revenue_Uplift_Pct, Expected_New_Revenue) %>%
  head(5) %>%
  mutate(
    Current_Revenue = dollar(Current_Revenue, prefix = "£"),
    Expected_New_Revenue = dollar(Expected_New_Revenue, prefix = "£")
  ) %>%
  kable(
    col.names = c("Segment", "Current Revenue", "Expected Uplift %", "New Revenue")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  print()
## <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:left;"> Segment </th>
##    <th style="text-align:left;"> Current Revenue </th>
##    <th style="text-align:right;"> Expected Uplift % </th>
##    <th style="text-align:left;"> New Revenue </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;"> Champions </td>
##    <td style="text-align:left;"> £5,625,826 </td>
##    <td style="text-align:right;"> 20 </td>
##    <td style="text-align:left;"> £1,125,165 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> At Risk </td>
##    <td style="text-align:left;"> £746,752 </td>
##    <td style="text-align:right;"> 40 </td>
##    <td style="text-align:left;"> £298,701 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Loyal Customers </td>
##    <td style="text-align:left;"> £970,745 </td>
##    <td style="text-align:right;"> 25 </td>
##    <td style="text-align:left;"> £242,686 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Need Attention </td>
##    <td style="text-align:left;"> £484,768 </td>
##    <td style="text-align:right;"> 20 </td>
##    <td style="text-align:left;"> £96,954 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Potential Loyalists </td>
##    <td style="text-align:left;"> £223,294 </td>
##    <td style="text-align:right;"> 40 </td>
##    <td style="text-align:left;"> £89,318 </td>
##   </tr>
## </tbody>
## </table>
cat("\n", rep("=", 70), "\n\n", sep = "")
## 
## ======================================================================

Implementation Roadmap

cat("90-DAY IMPLEMENTATION ROADMAP\n")
## 90-DAY IMPLEMENTATION ROADMAP
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================
cat("🗓️  WEEK 1-2: IMMEDIATE ACTIONS (Quick Wins)\n")
## 🗓️  WEEK 1-2: IMMEDIATE ACTIONS (Quick Wins)
cat("   ✓ Launch At Risk win-back campaign (URGENT)\n")
##    ✓ Launch At Risk win-back campaign (URGENT)
cat("   ✓ Set up Champions VIP program\n")
##    ✓ Set up Champions VIP program
cat("   ✓ Deploy New Customer welcome series\n")
##    ✓ Deploy New Customer welcome series
cat("   ✓ Create segment-based email lists\n")
##    ✓ Create segment-based email lists
cat("   ✓ Brief customer service team on segment priorities\n\n")
##    ✓ Brief customer service team on segment priorities
cat("🗓️  WEEK 3-4: CORE PROGRAMS\n")
## 🗓️  WEEK 3-4: CORE PROGRAMS
cat("   ✓ Launch loyalty points program (Loyal Customers)\n")
##    ✓ Launch loyalty points program (Loyal Customers)
cat("   ✓ Implement referral program (Champions)\n")
##    ✓ Implement referral program (Champions)
cat("   ✓ Start personalized product recommendations\n")
##    ✓ Start personalized product recommendations
cat("   ✓ Set up automated nurture campaigns (Potential Loyalists)\n\n")
##    ✓ Set up automated nurture campaigns (Potential Loyalists)
cat("🗓️  WEEK 5-8: OPTIMIZATION\n")
## 🗓️  WEEK 5-8: OPTIMIZATION
cat("   ✓ Analyze early campaign results\n")
##    ✓ Analyze early campaign results
cat("   ✓ A/B test messaging and offers\n")
##    ✓ A/B test messaging and offers
cat("   ✓ Refine segment definitions based on performance\n")
##    ✓ Refine segment definitions based on performance
cat("   ✓ Expand successful campaigns\n")
##    ✓ Expand successful campaigns
cat("   ✓ Pause/adjust underperforming initiatives\n\n")
##    ✓ Pause/adjust underperforming initiatives
cat("🗓️  WEEK 9-12: SCALE & MEASURE\n")
## 🗓️  WEEK 9-12: SCALE & MEASURE
cat("   ✓ Full deployment of all segment strategies\n")
##    ✓ Full deployment of all segment strategies
cat("   ✓ Monthly RFM re-calculation and reporting\n")
##    ✓ Monthly RFM re-calculation and reporting
cat("   ✓ Track customer segment migrations\n")
##    ✓ Track customer segment migrations
cat("   ✓ Calculate actual vs. expected ROI\n")
##    ✓ Calculate actual vs. expected ROI
cat("   ✓ Plan Q2 initiatives based on learnings\n\n")
##    ✓ Plan Q2 initiatives based on learnings
cat("📊 SUCCESS DASHBOARD (Track Monthly):\n")
## 📊 SUCCESS DASHBOARD (Track Monthly):
cat("   • Segment distribution changes\n")
##    • Segment distribution changes
cat("   • Revenue by segment\n")
##    • Revenue by segment
cat("   • Customer upgrade/downgrade rates\n")
##    • Customer upgrade/downgrade rates
cat("   • Campaign response rates by segment\n")
##    • Campaign response rates by segment
cat("   • Customer lifetime value trends\n")
##    • Customer lifetime value trends
cat("   • Overall retention rate\n\n")
##    • Overall retention rate
cat(rep("=", 70), "\n\n", sep = "")
## ======================================================================

Export Segment Customer Lists

Create CSV files for each key segment to enable marketing campaigns.

# Create reports directory if it doesn't exist
reports_dir <- here("reports", "rfm_segments")
if(!dir.exists(reports_dir)) {
  dir.create(reports_dir, recursive = TRUE)
}

cat("Exporting customer segment lists for marketing campaigns...\n\n")
## Exporting customer segment lists for marketing campaigns...
# Define priority segments to export
priority_segments <- c(
  "Champions",
  "Loyal Customers",
  "Potential Loyalists",
  "At Risk",
  "Can't Lose Them",
  "New Customers",
  "Promising",
  "Need Attention"
)

# Export each segment
for(seg in priority_segments) {
  # Filter and select relevant columns
  segment_data <- customer_rfm %>%
    filter(Segment == seg) %>%
    select(
      CustomerID,
      Segment,
      RFM_Score,
      R_Score,
      F_Score,
      M_Score,
      DaysSinceLastPurchase,
      TotalTransactions,
      TotalSpent,
      AverageOrderValue,
      FirstPurchaseDate,
      LastPurchaseDate,
      PrimaryCountry
    ) %>%
    arrange(desc(TotalSpent))  # Sort by value
  
  # Create filename
  filename <- paste0(gsub(" ", "_", tolower(seg)), "_customers.csv")
  filepath <- here(reports_dir, filename)
  
  # Export
  write_csv(segment_data, filepath)
  
  cat(sprintf("  ✓ Exported: %s (%d customers)\n", filename, nrow(segment_data)))
}
##   ✓ Exported: champions_customers.csv (848 customers)
##   ✓ Exported: loyal_customers_customers.csv (411 customers)
##   ✓ Exported: potential_loyalists_customers.csv (415 customers)
##   ✓ Exported: at_risk_customers.csv (428 customers)
##   ✓ Exported: can't_lose_them_customers.csv (96 customers)
##   ✓ Exported: new_customers_customers.csv (266 customers)
##   ✓ Exported: promising_customers.csv (137 customers)
##   ✓ Exported: need_attention_customers.csv (291 customers)
cat("\n📁 All segment lists exported to: ", reports_dir, "\n\n")
## 
## 📁 All segment lists exported to:  /Users/joaovictorgomes/Documents/ecommerce-customer-analytics/reports/rfm_segments
cat("FILES READY FOR MARKETING TEAMS:\n")
## FILES READY FOR MARKETING TEAMS:
cat("  • Use for email campaign targeting\n")
##   • Use for email campaign targeting
cat("  • Import into CRM systems\n")
##   • Import into CRM systems
cat("  • Create segment-specific audiences in ad platforms\n")
##   • Create segment-specific audiences in ad platforms
cat("  • Personalize communications based on RFM scores\n\n")
##   • Personalize communications based on RFM scores

Final Strategic Summary

cat("\n\n")
cat(rep("=", 70), "\n")
## = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
cat("                    FINAL STRATEGIC SUMMARY\n")
##                     FINAL STRATEGIC SUMMARY
cat(rep("=", 70), "\n\n")
## = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
# Calculate key summary metrics
total_customers <- nrow(customer_rfm)
total_revenue <- sum(customer_rfm$TotalSpent)

high_value_customers <- sum(customer_rfm$Segment %in% c("Champions", "Loyal Customers"))
high_value_revenue <- sum((customer_rfm %>% 
  filter(Segment %in% c("Champions", "Loyal Customers")))$TotalSpent)

at_risk_customers <- sum(customer_rfm$Segment %in% c("At Risk", "Can't Lose Them"))
at_risk_revenue <- sum((customer_rfm %>% 
  filter(Segment %in% c("At Risk", "Can't Lose Them")))$TotalSpent)

growth_customers <- sum(customer_rfm$Segment %in% c("Potential Loyalists", "Promising", "New Customers"))
growth_revenue <- sum((customer_rfm %>% 
  filter(Segment %in% c("Potential Loyalists", "Promising", "New Customers")))$TotalSpent)

churned_customers <- sum(customer_rfm$Segment %in% c("Hibernating", "Lost"))

cat("🎯 KEY FINDINGS:\n\n")
## 🎯 KEY FINDINGS:
cat("1. CUSTOMER BASE COMPOSITION\n")
## 1. CUSTOMER BASE COMPOSITION
cat(sprintf("   • Total customers analyzed: %s\n", comma(total_customers)))
##    • Total customers analyzed: 4,371
cat(sprintf("   • High-value customers (Champions + Loyal): %s (%.1f%%)\n",
            comma(high_value_customers),
            high_value_customers / total_customers * 100))
##    • High-value customers (Champions + Loyal): 1,259 (28.8%)
cat(sprintf("   • At-risk high-value customers: %s (%.1f%%)\n",
            comma(at_risk_customers),
            at_risk_customers / total_customers * 100))
##    • At-risk high-value customers: 524 (12.0%)
cat(sprintf("   • Growth opportunity customers: %s (%.1f%%)\n",
            comma(growth_customers),
            growth_customers / total_customers * 100))
##    • Growth opportunity customers: 818 (18.7%)
cat(sprintf("   • Churned customers: %s (%.1f%%)\n\n",
            comma(churned_customers),
            churned_customers / total_customers * 100))
##    • Churned customers: 1,203 (27.5%)
cat("2. REVENUE ANALYSIS\n")
## 2. REVENUE ANALYSIS
cat(sprintf("   • Total revenue: %s\n",
            dollar(total_revenue, prefix = "£")))
##    • Total revenue: £8,887,209
cat(sprintf("   • High-value segment revenue: %s (%.1f%%)\n",
            dollar(high_value_revenue, prefix = "£"),
            high_value_revenue / total_revenue * 100))
##    • High-value segment revenue: £6,596,571 (74.2%)
cat(sprintf("   • Revenue at risk: %s (%.1f%%)\n",
            dollar(at_risk_revenue, prefix = "£"),
            at_risk_revenue / total_revenue * 100))
##    • Revenue at risk: £954,332 (10.7%)
cat(sprintf("   • Revenue concentration (top 20%% customers): ~%.0f%%\n\n",
            point_80$Cumulative_Revenue_Pct))
##    • Revenue concentration (top 20% customers): ~80%
cat("3. CRITICAL BUSINESS PRIORITIES\n\n")
## 3. CRITICAL BUSINESS PRIORITIES
cat("   🔴 URGENT (Month 1):\n")
##    🔴 URGENT (Month 1):
cat(sprintf("      • Launch At Risk win-back campaign (%s at stake)\n",
            dollar(at_risk_revenue, prefix = "£")))
##       • Launch At Risk win-back campaign (£954,332 at stake)
cat("      • Implement Champions retention program\n")
##       • Implement Champions retention program
cat("      • Deploy New Customer onboarding\n\n")
##       • Deploy New Customer onboarding
cat("   🟡 HIGH PRIORITY (Month 2-3):\n")
##    🟡 HIGH PRIORITY (Month 2-3):
cat("      • Build Loyal → Champions upgrade path\n")
##       • Build Loyal → Champions upgrade path
cat("      • Nurture Potential Loyalists to Loyal status\n")
##       • Nurture Potential Loyalists to Loyal status
cat("      • Establish loyalty & referral programs\n\n")
##       • Establish loyalty & referral programs
cat("   🟢 ONGOING (Month 3+):\n")
##    🟢 ONGOING (Month 3+):
cat("      • Monitor segment migrations monthly\n")
##       • Monitor segment migrations monthly
cat("      • Optimize campaigns based on performance\n")
##       • Optimize campaigns based on performance
cat("      • Expand successful programs\n\n")
##       • Expand successful programs
cat("4. EXPECTED 12-MONTH IMPACT\n")
## 4. EXPECTED 12-MONTH IMPACT
cat(sprintf("   • Projected revenue increase: %s (%.1f%% growth)\n",
            dollar(sum(impact_analysis$Expected_New_Revenue), prefix = "£"),
            (sum(impact_analysis$Expected_New_Revenue) / total_revenue) * 100))
##    • Projected revenue increase: £2,018,999 (22.7% growth)
cat(sprintf("   • Projected total revenue: %s\n",
            dollar(sum(impact_analysis$Total_Expected_Revenue), prefix = "£")))
##    • Projected total revenue: £10,906,208
cat("   • Improved customer retention: 15-25%\n")
##    • Improved customer retention: 15-25%
cat("   • Increased customer lifetime value: 20-30%\n\n")
##    • Increased customer lifetime value: 20-30%
cat("5. SUCCESS FACTORS\n")
## 5. SUCCESS FACTORS
cat("   ✓ Segment-specific strategies (not one-size-fits-all)\n")
##    ✓ Segment-specific strategies (not one-size-fits-all)
cat("   ✓ Data-driven budget allocation\n")
##    ✓ Data-driven budget allocation
cat("   ✓ Focus on high-value customers first\n")
##    ✓ Focus on high-value customers first
cat("   ✓ Urgent intervention for at-risk customers\n")
##    ✓ Urgent intervention for at-risk customers
cat("   ✓ Continuous monitoring and optimization\n")
##    ✓ Continuous monitoring and optimization
cat("   ✓ Cross-functional alignment (marketing, service, product)\n\n")
##    ✓ Cross-functional alignment (marketing, service, product)
cat(rep("=", 70), "\n")
## = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
cat("                    RFM ANALYSIS COMPLETE\n")
##                     RFM ANALYSIS COMPLETE
cat(rep("=", 70), "\n\n")
## = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Next Steps & Recommendations

cat("📋 IMMEDIATE NEXT STEPS:\n\n")
## 📋 IMMEDIATE NEXT STEPS:
cat("1. STAKEHOLDER PRESENTATION (This Week)\n")
## 1. STAKEHOLDER PRESENTATION (This Week)
cat("   • Present findings to marketing and executive teams\n")
##    • Present findings to marketing and executive teams
cat("   • Get buy-in for recommended budget allocation\n")
##    • Get buy-in for recommended budget allocation
cat("   • Assign ownership for each segment strategy\n\n")
##    • Assign ownership for each segment strategy
cat("2. CAMPAIGN SETUP (Week 1-2)\n")
## 2. CAMPAIGN SETUP (Week 1-2)
cat("   • Use exported customer lists for targeting\n")
##    • Use exported customer lists for targeting
cat("   • Set up email marketing automation\n")
##    • Set up email marketing automation
cat("   • Brief creative team on segment messaging\n")
##    • Brief creative team on segment messaging
cat("   • Configure tracking and analytics\n\n")
##    • Configure tracking and analytics
cat("3. MEASUREMENT FRAMEWORK (Week 2-3)\n")
## 3. MEASUREMENT FRAMEWORK (Week 2-3)
cat("   • Define KPIs for each segment\n")
##    • Define KPIs for each segment
cat("   • Set up monthly RFM refresh process\n")
##    • Set up monthly RFM refresh process
cat("   • Create executive dashboard\n")
##    • Create executive dashboard
cat("   • Schedule monthly review meetings\n\n")
##    • Schedule monthly review meetings
cat("4. ONGOING OPTIMIZATION (Monthly)\n")
## 4. ONGOING OPTIMIZATION (Monthly)
cat("   • Re-calculate RFM scores\n")
##    • Re-calculate RFM scores
cat("   • Track segment migrations\n")
##    • Track segment migrations
cat("   • Measure campaign ROI by segment\n")
##    • Measure campaign ROI by segment
cat("   • Adjust strategies based on performance\n\n")
##    • Adjust strategies based on performance
cat("🎓 ADDITIONAL ANALYSES TO CONSIDER:\n\n")
## 🎓 ADDITIONAL ANALYSES TO CONSIDER:
cat("   • Product Affinity Analysis (Market Basket)\n")
##    • Product Affinity Analysis (Market Basket)
cat("     → Identify cross-sell opportunities by segment\n\n")
##      → Identify cross-sell opportunities by segment
cat("   • Predictive CLV Modeling\n")
##    • Predictive CLV Modeling
cat("     → Forecast future customer value\n\n")
##      → Forecast future customer value
cat("   • Churn Prediction Model\n")
##    • Churn Prediction Model
cat("     → Identify at-risk customers earlier\n\n")
##      → Identify at-risk customers earlier
cat("   • Geographic Segmentation\n")
##    • Geographic Segmentation
cat("     → Region-specific strategies\n\n")
##      → Region-specific strategies
cat("   • Channel Preference Analysis\n")
##    • Channel Preference Analysis
cat("     → Optimize communication channels by segment\n\n")
##      → Optimize communication channels by segment

Documentation Complete

cat("\n\n")
cat(rep("=", 70), "\n")
## = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
cat("              ✓ RFM ANALYSIS PROJECT COMPLETE ✓\n")
##               ✓ RFM ANALYSIS PROJECT COMPLETE ✓
cat(rep("=", 70), "\n\n")
## = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
cat("📊 DELIVERABLES CREATED:\n\n")
## 📊 DELIVERABLES CREATED:
cat("Analysis Outputs:\n")
## Analysis Outputs:
cat("  ✓ Customer RFM scores calculated (", comma(nrow(customer_rfm)), " customers)\n", sep = "")
##   ✓ Customer RFM scores calculated (4,371 customers)
cat("  ✓ 11 strategic customer segments defined\n")
##   ✓ 11 strategic customer segments defined
cat("  ✓ 7+ professional visualizations created\n")
##   ✓ 7+ professional visualizations created
cat("  ✓ Comprehensive segment analysis completed\n\n")
##   ✓ Comprehensive segment analysis completed
cat("Business Recommendations:\n")
## Business Recommendations:
cat("  ✓ Segment-specific action plans (", length(priority_segments), " segments)\n", sep = "")
##   ✓ Segment-specific action plans (8 segments)
cat("  ✓ Marketing budget allocation strategy\n")
##   ✓ Marketing budget allocation strategy
cat("  ✓ 90-day implementation roadmap\n")
##   ✓ 90-day implementation roadmap
cat("  ✓ Expected business impact quantified\n\n")
##   ✓ Expected business impact quantified
cat("Actionable Exports:\n")
## Actionable Exports:
cat("  ✓ Customer segment lists (", length(priority_segments), " CSV files)\n", sep = "")
##   ✓ Customer segment lists (8 CSV files)
cat("  ✓ Campaign targeting data ready\n")
##   ✓ Campaign targeting data ready
cat("  ✓ Executive summary prepared\n\n")
##   ✓ Executive summary prepared
cat("Files Generated:\n")
## Files Generated:
cat("  • notebooks/03_rfm_analysis.Rmd (this file)\n")
##   • notebooks/03_rfm_analysis.Rmd (this file)
cat("  • notebooks/03_rfm_analysis.html (knitted report)\n")
##   • notebooks/03_rfm_analysis.html (knitted report)
cat("  • data/processed/customer_rfm_scored.csv\n")
##   • data/processed/customer_rfm_scored.csv
cat("  • data/processed/rfm_segment_summary.csv\n")
##   • data/processed/rfm_segment_summary.csv
cat("  • reports/rfm_segments/*.csv (", length(priority_segments), " files)\n\n", sep = "")
##   • reports/rfm_segments/*.csv (8 files)
cat("📈 PROJECT IMPACT:\n")
## 📈 PROJECT IMPACT:
cat(sprintf("  • Revenue growth potential: %s\n",
            dollar(sum(impact_analysis$Expected_New_Revenue), prefix = "£")))
##   • Revenue growth potential: £2,018,999
cat(sprintf("  • Revenue increase: %.1f%%\n",
            (sum(impact_analysis$Expected_New_Revenue) / sum(customer_rfm$TotalSpent)) * 100))
##   • Revenue increase: 22.7%
cat(sprintf("  • At-risk revenue identified: %s\n",
            dollar(at_risk_revenue, prefix = "£")))
##   • At-risk revenue identified: £954,332
cat(sprintf("  • Growth opportunity quantified: %s\n",
            dollar(growth_revenue, prefix = "£")))
##   • Growth opportunity quantified: £403,270
cat("\n🎯 THIS ANALYSIS ENABLES:\n")
## 
## 🎯 THIS ANALYSIS ENABLES:
cat("  • Targeted marketing campaigns by customer value\n")
##   • Targeted marketing campaigns by customer value
cat("  • Optimized budget allocation (ROI-driven)\n")
##   • Optimized budget allocation (ROI-driven)
cat("  • Proactive churn prevention\n")
##   • Proactive churn prevention
cat("  • Data-driven customer lifecycle management\n")
##   • Data-driven customer lifecycle management
cat("  • Measurable business outcomes\n\n")
##   • Measurable business outcomes
cat(rep("=", 70), "\n\n")
## = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Session Information

# Document R version and packages for reproducibility
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DT_0.34.0         kableExtra_1.4.0  knitr_1.49        ggthemes_5.1.0   
##  [5] viridis_0.6.5     viridisLite_0.4.2 scales_1.3.0      here_1.0.1       
##  [9] lubridate_1.9.3   forcats_1.0.0     stringr_1.5.1     dplyr_1.1.4      
## [13] purrr_1.0.2       readr_2.1.5       tidyr_1.3.1       tibble_3.2.1     
## [17] ggplot2_3.5.2     tidyverse_2.0.0  
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9        utf8_1.2.4        generics_0.1.3    xml2_1.3.6       
##  [5] stringi_1.8.4     hms_1.1.3         digest_0.6.37     magrittr_2.0.3   
##  [9] evaluate_1.0.1    grid_4.4.2        timechange_0.3.0  fastmap_1.2.0    
## [13] rprojroot_2.0.4   jsonlite_1.8.9    gridExtra_2.3     fansi_1.0.6      
## [17] crosstalk_1.2.1   textshaping_0.4.0 jquerylib_0.1.4   cli_3.6.3        
## [21] crayon_1.5.3      rlang_1.1.4       bit64_4.5.2       munsell_0.5.1    
## [25] withr_3.0.2       cachem_1.1.0      yaml_2.3.10       parallel_4.4.2   
## [29] tools_4.4.2       tzdb_0.4.0        colorspace_2.1-1  vctrs_0.6.5      
## [33] R6_2.5.1          lifecycle_1.0.4   bit_4.5.0         htmlwidgets_1.6.4
## [37] vroom_1.6.5       pkgconfig_2.0.3   pillar_1.9.0      bslib_0.8.0      
## [41] gtable_0.3.6      glue_1.8.0        systemfonts_1.2.3 xfun_0.49        
## [45] tidyselect_1.2.1  rstudioapi_0.17.1 farver_2.1.2      htmltools_0.5.8.1
## [49] labeling_0.4.3    rmarkdown_2.29    svglite_2.2.1     compiler_4.4.2

🎉 RFM ANALYSIS COMPLETE! 🎉